@dair_ai
Banger paper from NVIDIA. Agentic reasoning needs models that are not just capable, but efficient at long-context inference. The agent model layer is moving toward open, long-context, high-throughput architectures. This paper introduces Nemotron 3 Super, an open 120B parameter model with 12B active parameters, built as a hybrid Mamba-Attention Mixture-of-Experts architecture. The headline numbers are strong: up to 1M context length, comparable accuracy on common benchmarks, and up to 2.2x higher throughput than GPT-OSS-120B and 7.5x higher throughput than Qwen3.5-122B. The model combines several efficiency bets, including NVFP4 pretraining, LatentMoE for accuracy per FLOP and per parameter, and MTP layers for native speculative decoding. It is trained on 25 trillion tokens, then post-trained with supervised fine-tuning and RL. Paper: https://t.co/VcqUPjylzF Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c