mamba
1 articles
NVIDIA Nemotron 3 Super: A 120B Open-Source Model That Only Uses 12B at a Time
NVIDIA released Nemotron 3 Super, a 120B parameter open-source reasoning model with only 12B active parameters. It combines Mamba and Transformer in a hybrid MoE architecture, scores 36 on the Intelligence Index, and runs at a blistering 484 tok/s.