@iScienceLuvr
NVIDIA introduces the Nemotron 3 family of models! Super, Ultra will be released later, Nano released today * Mixture-of-Experts hybrid Mamba–Transformer architecture Super and Ultra models: * are trained with NVFP4 * LatentMoE (project token embedding to smaller latent dimension for experts to process) * multi-token prediction full pretraining+post-training data and code will be made open-source