🐦 Twitter Post Details

Viewing enriched Twitter post

@Yesterday_work_

I'm reading NVIDIA's new paper and its wild. Everyone keeps talking about scaling transformers with bigger clusters and smarter optimizers… meanwhile NVIDIA and Oxford just showed you can train billion-parameter models using evolution strategies a method most people wrote off as ancient. The trick is a new system called EGGROLL, and it flips the entire cost model of ES. Normally, ES dies at scale because you have to generate full-rank perturbation matrices for every population member. For billion-parameter models, that means insane memory movement and ridiculous compute. These guys solved it by generating low-rank perturbations using two skinny matrices A and B and letting ABᵀ act as the update. The population average then behaves like a full-rank update without paying the full-rank price. The result? They run evolution strategies with population sizes in the hundreds of thousands a number earlier work couldn’t touch because everything melted under memory pressure. Now, throughput is basically as fast as batched inference. That’s unheard of for any gradient-free method. The math checks out too. The low-rank approximation converges to the true ES gradient at a 1/r rate, so pushing the rank recreates full ES behavior without the computational explosion. But the experiments are where it gets crazy. → They pretrain recurrent LMs from scratch using only integer datatypes. No gradients. No backprop. Fully stable even at hyperscale. → They match GRPO-tier methods on LLM reasoning benchmarks. That means ES can compete with modern RL-for-reasoning approaches on real tasks. → ES suddenly becomes viable for massive, discrete, hybrid, and non-differentiable systems the exact places where backprop is painful or impossible. This paper quietly rewrites a boundary: we didn’t struggle to scale ES because the algorithm was bad we struggled because we were doing it in the most expensive possible way. NVIDIA and Oxford removed the bottleneck. And now evolution strategies aren’t an old idea… they’re a frontier-scale training method.

Media 1

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1993996168704217539/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1993996168704217539/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-12-04T20:38:11.427495",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1993996168704217539",
  "url": "https://x.com/Yesterday_work_/status/1993996168704217539",
  "twitterUrl": "https://twitter.com/Yesterday_work_/status/1993996168704217539",
  "text": "I'm reading NVIDIA's new paper and its wild.\n\nEveryone keeps talking about scaling transformers with bigger clusters and smarter optimizers… meanwhile NVIDIA and Oxford just showed you can train billion-parameter models using evolution strategies a method most people wrote off as ancient.\n\nThe trick is a new system called EGGROLL, and it flips the entire cost model of ES.\n\nNormally, ES dies at scale because you have to generate full-rank perturbation matrices for every population member. For billion-parameter models, that means insane memory movement and ridiculous compute.\nThese guys solved it by generating low-rank perturbations using two skinny matrices A and B and letting ABᵀ act as the update.\n\nThe population average then behaves like a full-rank update without paying the full-rank price.\n\nThe result?\n\nThey run evolution strategies with population sizes in the hundreds of thousands a number earlier work couldn’t touch because everything melted under memory pressure. Now, throughput is basically as fast as batched inference.\n\nThat’s unheard of for any gradient-free method.\n\nThe math checks out too.\n\nThe low-rank approximation converges to the true ES gradient at a 1/r rate, so pushing the rank recreates full ES behavior without the computational explosion.\n\nBut the experiments are where it gets crazy.\n\n→ They pretrain recurrent LMs from scratch using only integer datatypes. No gradients. No backprop. Fully stable even at hyperscale.\n\n→ They match GRPO-tier methods on LLM reasoning benchmarks.\nThat means ES can compete with modern RL-for-reasoning approaches on real tasks.\n\n→ ES suddenly becomes viable for massive, discrete, hybrid, and non-differentiable systems the exact places where backprop is painful or impossible.\n\nThis paper quietly rewrites a boundary:\n\nwe didn’t struggle to scale ES because the algorithm was bad we struggled because we were doing it in the most expensive possible way.\n\nNVIDIA and Oxford removed the bottleneck.\n\nAnd now evolution strategies aren’t an old idea… they’re a frontier-scale training method.",
  "source": "Twitter for iPhone",
  "retweetCount": 194,
  "replyCount": 53,
  "likeCount": 1125,
  "quoteCount": 18,
  "viewCount": 92839,
  "createdAt": "Thu Nov 27 10:51:47 +0000 2025",
  "lang": "en",
  "bookmarkCount": 903,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1993996168704217539",
  "displayTextRange": [
    0,
    278
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "Yesterday_work_",
    "url": "https://x.com/Yesterday_work_",
    "twitterUrl": "https://twitter.com/Yesterday_work_",
    "id": "1909092495734235136",
    "name": "Millie Marconi",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1922820030942547968/Akb44ZoN_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1909092495734235136/1747270995",
    "description": "",
    "location": "",
    "followers": 11513,
    "following": 52,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Mon Apr 07 03:55:35 +0000 2025",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 96,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 854,
    "statusesCount": 1611,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1977641327312699583"
    ],
    "profile_bio": {
      "description": "Founder backed by VC, building AI-driven tech without a technical background. In the chaos of a startup pivot- learning, evolving, and embracing change.",
      "entities": {
        "description": {},
        "url": {
          "urls": [
            {
              "display_url": "testfeed.ai",
              "expanded_url": "https://testfeed.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/L00gPtdr29"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/5YhGwLJ03m",
        "expanded_url": "https://twitter.com/Yesterday_work_/status/1993996168704217539/photo/1",
        "ext_alt_text": "Academic paper titled \"Evolution Strategies at the Hyperscale\" with authors, abstract text and a diagram showing matrix perturbations and weighted updates.",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "1993996164870623240",
        "indices": [
          279,
          302
        ],
        "media_key": "3_1993996164870623240",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARusGPKCmtAICgACG6wY82ca0cMAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABG6wY8oKa0AgKAAIbrBjzZxrRwwAA",
            "media_key": "3_1993996164870623240"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/G6wY8oKa0Ag-0To.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 673,
              "w": 1202,
              "x": 0,
              "y": 563
            },
            {
              "h": 1202,
              "w": 1202,
              "x": 0,
              "y": 34
            },
            {
              "h": 1236,
              "w": 1084,
              "x": 118,
              "y": 0
            },
            {
              "h": 1236,
              "w": 618,
              "x": 401,
              "y": 0
            },
            {
              "h": 1236,
              "w": 1202,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1236,
          "width": 1202
        },
        "sizes": {
          "large": {
            "h": 1236,
            "w": 1202
          }
        },
        "type": "photo",
        "url": "https://t.co/5YhGwLJ03m"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {},
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}