🐦 Twitter Post Details

Viewing enriched Twitter post

@rasbt

I really didn't expect another major open-weight LLM release this December, but here we go: NVIDIA released their new Nemotron 3 series this week. It comes in 3 sizes: 1. Nano (30B-A3B), 2. Super (100B), 3. and Ultra (500B). Architecture-wise, the models are a Mixture-of-Experts (MoE) Mamba-Transformer hybrid architecture. As of this morning (Dec 19), only the Nano model has been released as an open-weight model, so this post will focus on that one (shown in my drawing below). Nemotron 3 Nano (30B-A3B) is a 52-layer hybrid Mamba-Transformer model that interleaves Mamba-2 sequence-modeling blocks with sparse Mixture-of-Experts (MoE) feed-forward layers, and uses self-attention only in a small subset of layers. There’s a lot going on in the figure above, but in short, the architecture is organized into 13 macro blocks with repeated Mamba-2 → MoE sub-blocks, plus a few Grouped-Query Attention layers. In total, if we multiply the macro- and sub-blocks, there are 52 layers in this architecture. Regarding the MoE modules, each MoE layer contains 128 experts but activates only 1 shared and 6 routed experts per token. The Mamba-2 layers would take a whole article itself to explain (perhaps a topic for another time). But for now, conceptually, you can think of them as similar to the Gated DeltaNet approach that Qwen3-Next and Kimi-Linear use, which I covered in my Beyond Standard LLMs article. The similarity between Gated DeltaNet and Mamba-2 layers is that both replace standard attention with a gated-state-space update. The idea behind this state-space-style module is that it maintains a running hidden state and mixes new inputs via learned gates. In contrast to attention, it scales linearly instead of quadratically with the input sequence length. What’s actually quite exciting about this architecture is its really good performance compared to pure transformer architectures of similar size (like Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B-A4B), while achieving much higher tokens-per-second throughput. Overall, this is an interesting direction, even more extreme than Qwen3-Next and Kimi-Linear in its use of only a few attention layers. However, one of the strengths of the transformer architecture is its performance at a (really) large scale. I am curious to see how the larger Nemotron 3 Super and especially Ultra will compare to the likes of DeepSeek V3.2.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2002382751866188246/media_0.jpg?",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-12-20T15:21:32.525645",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2002382751866188246",
  "url": "https://x.com/rasbt/status/2002382751866188246",
  "twitterUrl": "https://twitter.com/rasbt/status/2002382751866188246",
  "text": "I really didn't expect another major open-weight LLM release this December, but here we go: NVIDIA released their new Nemotron 3 series this week.\n\nIt comes in 3 sizes:\n\n1. Nano (30B-A3B),\n2. Super (100B),\n3. and Ultra (500B).\n\nArchitecture-wise, the models are a Mixture-of-Experts (MoE) Mamba-Transformer hybrid architecture. As of this morning (Dec 19), only the Nano model has been released as an open-weight model, so this post will focus on that one (shown in my drawing below).\n\nNemotron 3 Nano (30B-A3B) is a 52-layer hybrid Mamba-Transformer model that interleaves Mamba-2 sequence-modeling blocks with sparse Mixture-of-Experts (MoE) feed-forward layers, and uses self-attention only in a small subset of layers.\n\nThere’s a lot going on in the figure above, but in short, the architecture is organized into 13 macro blocks with repeated Mamba-2 → MoE sub-blocks, plus a few Grouped-Query Attention layers. In total, if we multiply the macro- and sub-blocks, there are 52 layers in this architecture.\n\nRegarding the MoE modules, each MoE layer contains 128 experts but activates only 1 shared and 6 routed experts per token.\n\nThe Mamba-2 layers would take a whole article itself to explain (perhaps a topic for another time). But for now, conceptually, you can think of them as similar to the Gated DeltaNet approach that Qwen3-Next and Kimi-Linear use, which I covered in my Beyond Standard LLMs article.\n\nThe similarity between Gated DeltaNet and Mamba-2 layers is that both replace standard attention with a gated-state-space update. The idea behind this state-space-style module is that it maintains a running hidden state and mixes new inputs via learned gates. In contrast to attention, it scales linearly instead of quadratically with the input sequence length.\n\nWhat’s actually quite exciting about this architecture is its really good performance compared to pure transformer architectures of similar size (like Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B-A4B), while achieving much higher tokens-per-second throughput.\n\nOverall, this is an interesting direction, even more extreme than Qwen3-Next and Kimi-Linear in its use of only a few attention layers. However, one of the strengths of the transformer architecture is its performance at a (really) large scale. I am curious to see how the larger Nemotron 3 Super and especially Ultra will compare to the likes of DeepSeek V3.2.",
  "source": "Twitter for iPhone",
  "retweetCount": 14,
  "replyCount": 10,
  "likeCount": 164,
  "quoteCount": 2,
  "viewCount": 9284,
  "createdAt": "Sat Dec 20 14:17:05 +0000 2025",
  "lang": "en",
  "bookmarkCount": 84,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2002382751866188246",
  "displayTextRange": [
    0,
    264
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "rasbt",
    "url": "https://x.com/rasbt",
    "twitterUrl": "https://twitter.com/rasbt",
    "id": "865622395",
    "name": "Sebastian Raschka",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1661187442043486209/a3E4t1eV_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/865622395/1742309979",
    "description": "ML/AI research engineer. Ex stats professor.\nAuthor of \"Build a Large Language Model From Scratch\" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)",
    "location": "",
    "followers": 374102,
    "following": 1112,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Sun Oct 07 02:06:16 +0000 2012",
    "entities": {
      "description": {
        "urls": [
          {
            "display_url": "amzn.to/4fqvn0D",
            "expanded_url": "https://amzn.to/4fqvn0D",
            "url": "https://t.co/O8LAAMRzzW",
            "indices": [
              100,
              123
            ]
          },
          {
            "display_url": "mng.bz/lZ5B",
            "expanded_url": "https://mng.bz/lZ5B",
            "url": "https://t.co/5TueQKx2Fk",
            "indices": [
              138,
              161
            ]
          }
        ]
      },
      "url": {
        "urls": [
          {
            "display_url": "sebastianraschka.com",
            "expanded_url": "https://sebastianraschka.com",
            "url": "https://t.co/HrtQQ5tgJl",
            "indices": [
              0,
              23
            ]
          }
        ]
      }
    },
    "fastFollowersCount": 0,
    "favouritesCount": 23676,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 2034,
    "statusesCount": 19122,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1991517493534552497"
    ],
    "profile_bio": {
      "description": "ML/AI research engineer. Ex stats professor.\nAuthor of \"Build a Large Language Model From Scratch\" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)"
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.x.com/E6c2Z3QOLM",
        "expanded_url": "https://x.com/rasbt/status/2002382751866188246/photo/1",
        "id_str": "2002382418473312256",
        "indices": [
          265,
          288
        ],
        "media_key": "3_2002382418473312256",
        "media_url_https": "https://pbs.twimg.com/media/G8nkM1WW4AAGowW.jpg",
        "type": "photo",
        "url": "https://t.co/E6c2Z3QOLM",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": [
              {
                "x": 116,
                "y": 916,
                "h": 100,
                "w": 100
              },
              {
                "x": 956,
                "y": 188,
                "h": 348,
                "w": 348
              },
              {
                "x": 552,
                "y": 600,
                "h": 458,
                "w": 458
              }
            ]
          },
          "medium": {
            "faces": [
              {
                "x": 67,
                "y": 536,
                "h": 58,
                "w": 58
              },
              {
                "x": 560,
                "y": 110,
                "h": 203,
                "w": 203
              },
              {
                "x": 323,
                "y": 351,
                "h": 268,
                "w": 268
              }
            ]
          },
          "small": {
            "faces": [
              {
                "x": 38,
                "y": 304,
                "h": 33,
                "w": 33
              },
              {
                "x": 317,
                "y": 62,
                "h": 115,
                "w": 115
              },
              {
                "x": 183,
                "y": 199,
                "h": 152,
                "w": 152
              }
            ]
          },
          "orig": {
            "faces": [
              {
                "x": 232,
                "y": 1832,
                "h": 200,
                "w": 200
              },
              {
                "x": 1912,
                "y": 376,
                "h": 696,
                "w": 696
              },
              {
                "x": 1104,
                "y": 1200,
                "h": 916,
                "w": 916
              }
            ]
          }
        },
        "sizes": {
          "large": {
            "h": 2048,
            "w": 1470,
            "resize": "fit"
          },
          "medium": {
            "h": 1200,
            "w": 861,
            "resize": "fit"
          },
          "small": {
            "h": 680,
            "w": 488,
            "resize": "fit"
          },
          "thumb": {
            "h": 150,
            "w": 150,
            "resize": "crop"
          }
        },
        "original_info": {
          "height": 4096,
          "width": 2939,
          "focus_rects": [
            {
              "x": 0,
              "y": 97,
              "w": 2939,
              "h": 1646
            },
            {
              "x": 0,
              "y": 0,
              "w": 2939,
              "h": 2939
            },
            {
              "x": 0,
              "y": 0,
              "w": 2939,
              "h": 3350
            },
            {
              "x": 509,
              "y": 0,
              "w": 2048,
              "h": 4096
            },
            {
              "x": 0,
              "y": 0,
              "w": 2939,
              "h": 4096
            }
          ]
        },
        "allow_download_status": {
          "allow_download": true
        },
        "media_results": {
          "result": {
            "media_key": "3_2002382418473312256"
          }
        }
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": []
  },
  "quoted_tweet": {
    "type": "tweet",
    "id": "1999847254367117736",
    "url": "https://x.com/rasbt/status/1999847254367117736",
    "twitterUrl": "https://twitter.com/rasbt/status/1999847254367117736",
    "text": "Just updated the Big LLM Architecture Comparison article...\n...it grew quite a bit since the initial version in July 2025, more than doubled!\nhttps://t.co/oEt8XzNxik https://t.co/RZuwp6ZUaF",
    "source": "Twitter for iPhone",
    "retweetCount": 455,
    "replyCount": 41,
    "likeCount": 2470,
    "quoteCount": 17,
    "viewCount": 131444,
    "createdAt": "Sat Dec 13 14:21:55 +0000 2025",
    "lang": "en",
    "bookmarkCount": 2165,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "1999847254367117736",
    "displayTextRange": [
      0,
      165
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "rasbt",
      "url": "https://x.com/rasbt",
      "twitterUrl": "https://twitter.com/rasbt",
      "id": "865622395",
      "name": "Sebastian Raschka",
      "isVerified": false,
      "isBlueVerified": true,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/1661187442043486209/a3E4t1eV_normal.jpg",
      "coverPicture": "https://pbs.twimg.com/profile_banners/865622395/1742309979",
      "description": "ML/AI research engineer. Ex stats professor.\nAuthor of \"Build a Large Language Model From Scratch\" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)",
      "location": "",
      "followers": 374102,
      "following": 1112,
      "status": "",
      "canDm": false,
      "canMediaTag": true,
      "createdAt": "Sun Oct 07 02:06:16 +0000 2012",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "amzn.to/4fqvn0D",
              "expanded_url": "https://amzn.to/4fqvn0D",
              "url": "https://t.co/O8LAAMRzzW",
              "indices": [
                100,
                123
              ]
            },
            {
              "display_url": "mng.bz/lZ5B",
              "expanded_url": "https://mng.bz/lZ5B",
              "url": "https://t.co/5TueQKx2Fk",
              "indices": [
                138,
                161
              ]
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "sebastianraschka.com",
              "expanded_url": "https://sebastianraschka.com",
              "url": "https://t.co/HrtQQ5tgJl",
              "indices": [
                0,
                23
              ]
            }
          ]
        }
      },
      "fastFollowersCount": 0,
      "favouritesCount": 23676,
      "hasCustomTimelines": true,
      "isTranslator": false,
      "mediaCount": 2034,
      "statusesCount": 19122,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [
        "1991517493534552497"
      ],
      "profile_bio": {
        "description": "ML/AI research engineer. Ex stats professor.\nAuthor of \"Build a Large Language Model From Scratch\" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)"
      },
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {
      "media": [
        {
          "display_url": "pic.x.com/RZuwp6ZUaF",
          "expanded_url": "https://x.com/rasbt/status/1999847254367117736/photo/1",
          "id_str": "1999847030831677442",
          "indices": [
            166,
            189
          ],
          "media_key": "3_1999847030831677442",
          "media_url_https": "https://pbs.twimg.com/media/G8DiR2XXgAI0Eil.jpg",
          "type": "photo",
          "url": "https://t.co/RZuwp6ZUaF",
          "ext_media_availability": {
            "status": "Available"
          },
          "features": {
            "large": {
              "faces": [
                {
                  "x": 1032,
                  "y": 530,
                  "h": 102,
                  "w": 102
                }
              ]
            },
            "medium": {
              "faces": [
                {
                  "x": 604,
                  "y": 310,
                  "h": 59,
                  "w": 59
                }
              ]
            },
            "small": {
              "faces": [
                {
                  "x": 342,
                  "y": 175,
                  "h": 33,
                  "w": 33
                }
              ]
            },
            "orig": {
              "faces": [
                {
                  "x": 2064,
                  "y": 1060,
                  "h": 204,
                  "w": 204
                }
              ]
            }
          },
          "sizes": {
            "large": {
              "h": 2048,
              "w": 1494,
              "resize": "fit"
            },
            "medium": {
              "h": 1200,
              "w": 875,
              "resize": "fit"
            },
            "small": {
              "h": 680,
              "w": 496,
              "resize": "fit"
            },
            "thumb": {
              "h": 150,
              "w": 150,
              "resize": "crop"
            }
          },
          "original_info": {
            "height": 4096,
            "width": 2988,
            "focus_rects": [
              {
                "x": 0,
                "y": 0,
                "w": 2988,
                "h": 1673
              },
              {
                "x": 0,
                "y": 0,
                "w": 2988,
                "h": 2988
              },
              {
                "x": 0,
                "y": 0,
                "w": 2988,
                "h": 3406
              },
              {
                "x": 0,
                "y": 0,
                "w": 2048,
                "h": 4096
              },
              {
                "x": 0,
                "y": 0,
                "w": 2988,
                "h": 4096
              }
            ]
          },
          "allow_download_status": {
            "allow_download": true
          },
          "media_results": {
            "result": {
              "media_key": "3_1999847030831677442"
            }
          }
        }
      ]
    },
    "card": null,
    "place": {},
    "entities": {
      "hashtags": [],
      "media": [
        {
          "display_url": "pic.x.com/RZuwp6ZUaF",
          "expanded_url": "https://x.com/rasbt/status/1999847254367117736/photo/1",
          "id_str": "1999847030831677442",
          "indices": [
            166,
            189
          ],
          "media_key": "3_1999847030831677442",
          "media_url_https": "https://pbs.twimg.com/media/G8DiR2XXgAI0Eil.jpg",
          "type": "photo",
          "url": "https://t.co/RZuwp6ZUaF",
          "ext_media_availability": {
            "status": "Available"
          },
          "features": {
            "large": {
              "faces": [
                {
                  "x": 1032,
                  "y": 530,
                  "h": 102,
                  "w": 102
                }
              ]
            },
            "medium": {
              "faces": [
                {
                  "x": 604,
                  "y": 310,
                  "h": 59,
                  "w": 59
                }
              ]
            },
            "small": {
              "faces": [
                {
                  "x": 342,
                  "y": 175,
                  "h": 33,
                  "w": 33
                }
              ]
            },
            "orig": {
              "faces": [
                {
                  "x": 2064,
                  "y": 1060,
                  "h": 204,
                  "w": 204
                }
              ]
            }
          },
          "sizes": {
            "large": {
              "h": 2048,
              "w": 1494,
              "resize": "fit"
            },
            "medium": {
              "h": 1200,
              "w": 875,
              "resize": "fit"
            },
            "small": {
              "h": 680,
              "w": 496,
              "resize": "fit"
            },
            "thumb": {
              "h": 150,
              "w": 150,
              "resize": "crop"
            }
          },
          "original_info": {
            "height": 4096,
            "width": 2988,
            "focus_rects": [
              {
                "x": 0,
                "y": 0,
                "w": 2988,
                "h": 1673
              },
              {
                "x": 0,
                "y": 0,
                "w": 2988,
                "h": 2988
              },
              {
                "x": 0,
                "y": 0,
                "w": 2988,
                "h": 3406
              },
              {
                "x": 0,
                "y": 0,
                "w": 2048,
                "h": 4096
              },
              {
                "x": 0,
                "y": 0,
                "w": 2988,
                "h": 4096
              }
            ]
          },
          "allow_download_status": {
            "allow_download": true
          },
          "media_results": {
            "result": {
              "media_key": "3_1999847030831677442"
            }
          }
        }
      ],
      "symbols": [],
      "timestamps": [],
      "urls": [
        {
          "display_url": "magazine.sebastianraschka.com/p/the-big-llm-…",
          "expanded_url": "https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison",
          "url": "https://t.co/oEt8XzNxik",
          "indices": [
            142,
            165
          ]
        }
      ],
      "user_mentions": []
    },
    "quoted_tweet": null,
    "retweeted_tweet": null,
    "article": null
  },
  "retweeted_tweet": null,
  "article": null
}