@vllm_project
๐ Congrats to @nvidia on the release of Nemotron 3 Super โ day-0 support in vLLM v0.17.1! Verified on NVIDIA GPUs. 120B hybrid MoE, only 12B active at inference. Big upgrades over the previous Nemotron Super: - 5x higher throughput - 2x higher accuracy on Artificial Analysis Intelligence Index - Multi-Token Prediction (MTP) for faster long-form generation - Configurable thinking budget โ dial accuracy vs token cost per task - 1M token context window Supports BF16, FP8, and NVFP4. Fully open: weights, datasets, recipes. Blog: https://t.co/PAN0y778iB ๐ค Thanks @NVIDIAAIDev Nemotron team and vLLM community contributors!