@rasbt
Implemented Olmo 3 from scratch (in a standalone notebook) this weekend! If you are a coder, probably the best way to read the architecture details at a glance: https://t.co/wF8PkoDuBe https://t.co/rvXJJ5FZ3r
Viewing enriched Twitter post
Implemented Olmo 3 from scratch (in a standalone notebook) this weekend! If you are a coder, probably the best way to read the architecture details at a glance: https://t.co/wF8PkoDuBe https://t.co/rvXJJ5FZ3r
{
"media": [
{
"url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1992601366007795999/media_0.jpg?",
"media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1992601366007795999/media_0.jpg?",
"type": "photo",
"filename": "media_0.jpg"
},
{
"url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1992601366007795999/media_1.jpg?",
"media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1992601366007795999/media_1.jpg?",
"type": "photo",
"filename": "media_1.jpg"
}
],
"processed_at": "2025-11-27T20:39:18.126970",
"pipeline_version": "2.0"
} {
"type": "tweet",
"id": "1992601366007795999",
"url": "https://x.com/rasbt/status/1992601366007795999",
"twitterUrl": "https://twitter.com/rasbt/status/1992601366007795999",
"text": "Implemented Olmo 3 from scratch (in a standalone notebook) this weekend! \nIf you are a coder, probably the best way to read the architecture details at a glance: https://t.co/wF8PkoDuBe https://t.co/rvXJJ5FZ3r",
"source": "Twitter for iPhone",
"retweetCount": 297,
"replyCount": 17,
"likeCount": 2009,
"quoteCount": 10,
"viewCount": 159614,
"createdAt": "Sun Nov 23 14:29:21 +0000 2025",
"lang": "en",
"bookmarkCount": 1790,
"isReply": false,
"inReplyToId": null,
"conversationId": "1992601366007795999",
"displayTextRange": [
0,
185
],
"inReplyToUserId": null,
"inReplyToUsername": null,
"author": {
"type": "user",
"userName": "rasbt",
"url": "https://x.com/rasbt",
"twitterUrl": "https://twitter.com/rasbt",
"id": "865622395",
"name": "Sebastian Raschka",
"isVerified": false,
"isBlueVerified": true,
"verifiedType": null,
"profilePicture": "https://pbs.twimg.com/profile_images/1661187442043486209/a3E4t1eV_normal.jpg",
"coverPicture": "https://pbs.twimg.com/profile_banners/865622395/1742309979",
"description": "",
"location": "",
"followers": 370431,
"following": 1107,
"status": "",
"canDm": false,
"canMediaTag": true,
"createdAt": "Sun Oct 07 02:06:16 +0000 2012",
"entities": {
"description": {
"urls": []
},
"url": {}
},
"fastFollowersCount": 0,
"favouritesCount": 23506,
"hasCustomTimelines": true,
"isTranslator": false,
"mediaCount": 2026,
"statusesCount": 19021,
"withheldInCountries": [],
"affiliatesHighlightedLabel": {},
"possiblySensitive": false,
"pinnedTweetIds": [
"1991517493534552497"
],
"profile_bio": {
"description": "ML/AI research engineer. Ex stats professor.\nAuthor of \"Build a Large Language Model From Scratch\" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)",
"entities": {
"description": {
"urls": [
{
"display_url": "amzn.to/4fqvn0D",
"expanded_url": "https://amzn.to/4fqvn0D",
"indices": [
100,
123
],
"url": "https://t.co/O8LAAMRzzW"
},
{
"display_url": "mng.bz/lZ5B",
"expanded_url": "https://mng.bz/lZ5B",
"indices": [
138,
161
],
"url": "https://t.co/5TueQKx2Fk"
}
]
},
"url": {
"urls": [
{
"display_url": "sebastianraschka.com",
"expanded_url": "https://sebastianraschka.com",
"indices": [
0,
23
],
"url": "https://t.co/HrtQQ5tgJl"
}
]
}
}
},
"isAutomated": false,
"automatedBy": null
},
"extendedEntities": {
"media": [
{
"allow_download_status": {
"allow_download": true
},
"display_url": "pic.twitter.com/rvXJJ5FZ3r",
"expanded_url": "https://twitter.com/rasbt/status/1992601366007795999/photo/1",
"ext_media_availability": {
"status": "Available"
},
"features": {
"large": {
"faces": [
{
"h": 250,
"w": 250,
"x": 894,
"y": 24
}
]
},
"orig": {
"faces": [
{
"h": 500,
"w": 500,
"x": 1788,
"y": 48
}
]
}
},
"id_str": "1992600634470682624",
"indices": [
186,
209
],
"media_key": "3_1992600634470682624",
"media_results": {
"id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARunI7hE19AACgACG6ckYpfaUR8AAA==",
"result": {
"__typename": "ApiMedia",
"id": "QXBpTWVkaWE6DAABCgABG6cjuETX0AAKAAIbpyRil9pRHwAA",
"media_key": "3_1992600634470682624"
}
},
"media_url_https": "https://pbs.twimg.com/media/G6cjuETX0AAm09o.jpg",
"original_info": {
"focus_rects": [
{
"h": 2294,
"w": 4096,
"x": 0,
"y": 0
},
{
"h": 4096,
"w": 4096,
"x": 0,
"y": 0
},
{
"h": 4096,
"w": 3593,
"x": 0,
"y": 0
},
{
"h": 4096,
"w": 2048,
"x": 716,
"y": 0
},
{
"h": 4096,
"w": 4096,
"x": 0,
"y": 0
}
],
"height": 4096,
"width": 4096
},
"sizes": {
"large": {
"h": 2048,
"w": 2048
}
},
"type": "photo",
"url": "https://t.co/rvXJJ5FZ3r"
}
]
},
"card": null,
"place": {},
"entities": {
"urls": [
{
"display_url": "github.com/rasbt/LLMs-froβ¦",
"expanded_url": "https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/13_olmo3/standalone-olmo3.ipynb",
"indices": [
162,
185
],
"url": "https://t.co/wF8PkoDuBe"
}
]
},
"quoted_tweet": {
"type": "tweet",
"id": "1991656199394050380",
"url": "https://x.com/rasbt/status/1991656199394050380",
"twitterUrl": "https://twitter.com/rasbt/status/1991656199394050380",
"text": "Olmo models are always a highlight due to them being fully transparent and their nice, detailed technical reports. \nI am sure I'll talk more about the interesting training-related aspects from that 100-pager in the upcoming days and weeks.\nIn the meantime, here's the side-by-side architecture comparison with Qwen3. \n\n1) As we can see, the Olmo 3 architecture is relatively similar to Qwen3. However, it's worth noting that this is essentially likely inspired by the Olmo 2 predecessor, not Qwen3. \n\n2) Similar to Olmo 2, Olmo 3 still uses a post-norm flavor instead of pre-norm, as they found in the Olmo 2 paper that it stabilizes the training.\n\n3) Interestingly, the 7B model still uses multi-head attention similar to Olmo 2. However, to make things more efficient and shrink the KV cache size, they now use sliding window attention (e.g., similar to Gemma 3.)\n\nNext, let's look at the 32B model.\n\n4) Overall, it's the same architecture but just scaled up. Also, the proportions (e.g., going from the input to the intermediate size in the feed forward layer, and so on) roughly match the ones in Qwen3. \n\n5) My guess is the architecture was initially somewhat smaller than Qwen3 due to the smaller vocabulary, and they then scaled up the intermediate size expansion from 5x in Qwen 3 to 5.4 in Olmo 3 to have a 32B model for a direct comparison. \n\n6) Also, note that the 32B model (finally!) uses grouped query attention.",
"source": "Twitter for iPhone",
"retweetCount": 135,
"replyCount": 17,
"likeCount": 860,
"quoteCount": 4,
"viewCount": 235498,
"createdAt": "Thu Nov 20 23:53:35 +0000 2025",
"lang": "en",
"bookmarkCount": 499,
"isReply": false,
"inReplyToId": null,
"conversationId": "1991656199394050380",
"displayTextRange": [
0,
281
],
"inReplyToUserId": null,
"inReplyToUsername": null,
"author": {
"type": "user",
"userName": "rasbt",
"url": "https://x.com/rasbt",
"twitterUrl": "https://twitter.com/rasbt",
"id": "865622395",
"name": "Sebastian Raschka",
"isVerified": false,
"isBlueVerified": true,
"verifiedType": null,
"profilePicture": "https://pbs.twimg.com/profile_images/1661187442043486209/a3E4t1eV_normal.jpg",
"coverPicture": "https://pbs.twimg.com/profile_banners/865622395/1742309979",
"description": "",
"location": "",
"followers": 370431,
"following": 1107,
"status": "",
"canDm": false,
"canMediaTag": true,
"createdAt": "Sun Oct 07 02:06:16 +0000 2012",
"entities": {
"description": {
"urls": []
},
"url": {}
},
"fastFollowersCount": 0,
"favouritesCount": 23506,
"hasCustomTimelines": true,
"isTranslator": false,
"mediaCount": 2026,
"statusesCount": 19021,
"withheldInCountries": [],
"affiliatesHighlightedLabel": {},
"possiblySensitive": false,
"pinnedTweetIds": [
"1991517493534552497"
],
"profile_bio": {
"description": "ML/AI research engineer. Ex stats professor.\nAuthor of \"Build a Large Language Model From Scratch\" (https://t.co/O8LAAMRzzW) & reasoning (https://t.co/5TueQKx2Fk)",
"entities": {
"description": {
"urls": [
{
"display_url": "amzn.to/4fqvn0D",
"expanded_url": "https://amzn.to/4fqvn0D",
"indices": [
100,
123
],
"url": "https://t.co/O8LAAMRzzW"
},
{
"display_url": "mng.bz/lZ5B",
"expanded_url": "https://mng.bz/lZ5B",
"indices": [
138,
161
],
"url": "https://t.co/5TueQKx2Fk"
}
]
},
"url": {
"urls": [
{
"display_url": "sebastianraschka.com",
"expanded_url": "https://sebastianraschka.com",
"indices": [
0,
23
],
"url": "https://t.co/HrtQQ5tgJl"
}
]
}
}
},
"isAutomated": false,
"automatedBy": null
},
"extendedEntities": {
"media": [
{
"allow_download_status": {
"allow_download": true
},
"display_url": "pic.twitter.com/7JJ54eIew4",
"expanded_url": "https://twitter.com/rasbt/status/1991656199394050380/photo/1",
"ext_media_availability": {
"status": "Available"
},
"features": {
"large": {},
"orig": {}
},
"id_str": "1991655972775899136",
"indices": [
282,
305
],
"media_key": "3_1991655972775899136",
"media_results": {
"id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARujyI4UF7AACgACG6PIwteWQUwAAA==",
"result": {
"__typename": "ApiMedia",
"id": "QXBpTWVkaWE6DAABCgABG6PIjhQXsAAKAAIbo8jC15ZBTAAA",
"media_key": "3_1991655972775899136"
}
},
"media_url_https": "https://pbs.twimg.com/media/G6PIjhQXsAApJ9p.jpg",
"original_info": {
"focus_rects": [
{
"h": 2045,
"w": 3652,
"x": 0,
"y": 1128
},
{
"h": 3652,
"w": 3652,
"x": 0,
"y": 324
},
{
"h": 4096,
"w": 3593,
"x": 0,
"y": 0
},
{
"h": 4096,
"w": 2048,
"x": 0,
"y": 0
},
{
"h": 4096,
"w": 3652,
"x": 0,
"y": 0
}
],
"height": 4096,
"width": 3652
},
"sizes": {
"large": {
"h": 2048,
"w": 1826
}
},
"type": "photo",
"url": "https://t.co/7JJ54eIew4"
}
]
},
"card": null,
"place": {},
"entities": {},
"quoted_tweet": {
"type": "tweet",
"id": "1991508141687861479",
"url": "",
"twitterUrl": "",
"text": "",
"source": "Twitter for iPhone",
"retweetCount": 0,
"replyCount": 0,
"likeCount": 0,
"quoteCount": 0,
"viewCount": 0,
"createdAt": "",
"lang": "",
"bookmarkCount": 0,
"isReply": false,
"inReplyToId": null,
"conversationId": "",
"displayTextRange": [],
"inReplyToUserId": null,
"inReplyToUsername": null,
"author": {},
"extendedEntities": {},
"card": null,
"place": {},
"entities": {},
"quoted_tweet": null,
"retweeted_tweet": null,
"isLimitedReply": false,
"article": null
},
"retweeted_tweet": null,
"isLimitedReply": false,
"article": null
},
"retweeted_tweet": null,
"isLimitedReply": false,
"article": null
}