🐦 Twitter Post Details

Viewing enriched Twitter post

@BrianRoemmele

LeWorldModel: Yann LeCuns Radical Simplification of World Models Just Made Physics-Aware AI Practical In the race for artificial general intelligence, two paths have emerged. One is the familiar scale everything route: bigger LLMs trained on ever-larger text corpora. The other, championed for years by Yann LeCun, is building world models: compact systems that learn the underlying physics of reality directly from raw sensory data (pixels) so AI can plan, predict, and act in the physical world like a robot or self-driving car actually would. Until now, the second path has been frustratingly difficult. Joint-Embedding Predictive Architectures (JEPAs) - LeCuns elegant framework for learning predictive representations without reconstructing every pixel - kept collapsing during training. Researchers had to resort to a laundry list of hacks: multi-term loss functions (up to six hyperparameters), frozen pre-trained encoders, stop-gradients, exponential moving averages, and other duct-tape tricks just to keep the model from mapping every input to the same useless output. LeCuns team (Mila, NYU, Samsung SAIL, and Brown University) dropped a bombshell: LeWorldModel (LeWM) - the first JEPA that trains stably end-to-end from raw pixels using only two loss terms. No more house-of-cards engineering. Just a clean, simple recipe that works on a single GPU in a few hours with only 15 million parameters. The Core Breakthrough: SIGReg Saves the Day LeWorldModels secret weapon is a new regularizer called SIGReg (for spherical isotropic Gaussian regularizer). It enforces a simple Gaussian distribution on the latent embeddings. This single term prevents representation collapse without any of the previous heuristics. The training objective now has just two parts: 1. Next-embedding prediction loss - the model predicts what the next latent state should be. 2. SIGReg - keeps the latent space well-behaved and diverse. Thats it. Hyperparameters drop from six to one. Training becomes stable, reproducible, and dramatically cheaper. The model learns directly from raw video frames (no pre-trained vision encoders needed) and produces a compact latent world model that can be used for fast planning. Impressive Results on Real Benchmarks Despite its tiny size, LeWorldModel punches way above its weight: - Trains on a single GPU in a few hours. - Plans actions up to 48 times faster than foundation-model-based world models. - Uses roughly 200 times fewer tokens than alternatives. - Matches or beats far larger models on diverse 2D and 3D control tasks (e.g., manipulation, navigation). - Its latent space encodes meaningful physical quantities (position, velocity, etc.) - proven by direct probing. - It reliably detects physically implausible surprise events, showing genuine causal understanding. Crucially, adding a decoder and reconstruction loss hurts performance on downstream control tasks. The pure JEPA objective already captures everything needed for planning - extra visual details just get in the way. Project website: https://t.co/KhGR9LiIQZ Official code: https://t.co/s1lI9kevJS Why This Matters for the Future of AI LeCun has been saying since 2022 that world models (not next-token predictors) are the key to real intelligence. Critics always pointed to the training instability. LeWorldModel removes that objection with elegant simplicity. This is a philosophical reset: AI can learn physics the way babies do - by watching the world unfold - without needing supercomputers or endless text. The implications for robotics, autonomous vehicles, and embodied agents are enormous. Suddenly, building a physically grounded planner is something a researcher (or even a hobbyist) can do on consumer hardware. 1 of 2

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2036826341581185171/media_0.jpg",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2036826341581185171/media_0.jpg",
      "type": "photo",
      "filename": "media_0.jpg"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2036826341581185171/media_1.jpg",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2036826341581185171/media_1.jpg",
      "type": "photo",
      "filename": "media_1.jpg"
    }
  ],
  "processed_at": "2026-03-25T22:52:22.036665",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2036826341581185171",
  "url": "https://x.com/BrianRoemmele/status/2036826341581185171",
  "twitterUrl": "https://twitter.com/BrianRoemmele/status/2036826341581185171",
  "text": "LeWorldModel: Yann LeCuns Radical Simplification of World Models Just Made Physics-Aware AI Practical\n\nIn the race for artificial general intelligence, two paths have emerged. One is the familiar scale everything route: bigger LLMs trained on ever-larger text corpora. The other, championed for years by Yann LeCun, is building world models: compact systems that learn the underlying physics of reality directly from raw sensory data (pixels) so AI can plan, predict, and act in the physical world like a robot or self-driving car actually would.\n\nUntil now, the second path has been frustratingly difficult. Joint-Embedding Predictive Architectures (JEPAs) - LeCuns elegant framework for learning predictive representations without reconstructing every pixel - kept collapsing during training. Researchers had to resort to a laundry list of hacks: multi-term loss functions (up to six hyperparameters), frozen pre-trained encoders, stop-gradients, exponential moving averages, and other duct-tape tricks just to keep the model from mapping every input to the same useless output.\n\nLeCuns team (Mila, NYU, Samsung SAIL, and Brown University) dropped a bombshell: \n\nLeWorldModel (LeWM) - the first JEPA that trains stably end-to-end from raw pixels using only two loss terms. No more house-of-cards engineering. Just a clean, simple recipe that works on a single GPU in a few hours with only 15 million parameters.\n\nThe Core Breakthrough: SIGReg Saves the Day\n\nLeWorldModels secret weapon is a new regularizer called SIGReg (for spherical isotropic Gaussian regularizer). It enforces a simple Gaussian distribution on the latent embeddings. \n\nThis single term prevents representation collapse without any of the previous heuristics.\n\nThe training objective now has just two parts:\n\n1. Next-embedding prediction loss - the model predicts what the next latent state should be.\n\n2. SIGReg - keeps the latent space well-behaved and diverse.\n\nThats it. Hyperparameters drop from six to one. Training becomes stable, reproducible, and dramatically cheaper.\n\nThe model learns directly from raw video frames (no pre-trained vision encoders needed) and produces a compact latent world model that can be used for fast planning.\n\nImpressive Results on Real Benchmarks\n\nDespite its tiny size, LeWorldModel punches way above its weight:\n\n- Trains on a single GPU in a few hours.\n- Plans actions up to 48 times faster than foundation-model-based world models.\n- Uses roughly 200 times fewer tokens than alternatives.\n- Matches or beats far larger models on diverse 2D and 3D control tasks (e.g., manipulation, navigation).\n- Its latent space encodes meaningful physical quantities (position, velocity, etc.) - proven by direct probing.\n- It reliably detects physically implausible surprise events, showing genuine causal understanding.\n\nCrucially, adding a decoder and reconstruction loss hurts performance on downstream control tasks. The pure JEPA objective already captures everything needed for planning - extra visual details just get in the way.\n\nProject website: https://t.co/KhGR9LiIQZ\nOfficial code: https://t.co/s1lI9kevJS\n\nWhy This Matters for the Future of AI\n\nLeCun has been saying since 2022 that world models (not next-token predictors) are the key to real intelligence. Critics always pointed to the training instability. LeWorldModel removes that objection with elegant simplicity.\n\nThis is a philosophical reset: AI can learn physics the way babies do - by watching the world unfold - without needing supercomputers or endless text. \n\nThe implications for robotics, autonomous vehicles, and embodied agents are enormous. Suddenly, building a physically grounded planner is something a researcher (or even a hobbyist) can do on consumer hardware.\n\n1 of 2",
  "source": "Twitter for iPhone",
  "retweetCount": 29,
  "replyCount": 9,
  "likeCount": 146,
  "quoteCount": 2,
  "viewCount": 9443,
  "createdAt": "Wed Mar 25 15:23:36 +0000 2026",
  "lang": "en",
  "bookmarkCount": 127,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2036826341581185171",
  "displayTextRange": [
    0,
    279
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "BrianRoemmele",
    "url": "https://x.com/BrianRoemmele",
    "twitterUrl": "https://twitter.com/BrianRoemmele",
    "id": "101584084",
    "name": "Brian Roemmele",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1492616506/Brian-Med-Green-Fin_normal.png",
    "coverPicture": "https://pbs.twimg.com/profile_banners/101584084/1414798559",
    "description": "",
    "location": "transcendence",
    "followers": 464196,
    "following": 43024,
    "status": "",
    "canDm": true,
    "canMediaTag": false,
    "createdAt": "Sun Jan 03 22:04:29 +0000 2010",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 644110,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 46417,
    "statusesCount": 169590,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1564825039731535872"
    ],
    "profile_bio": {
      "description": "we can only see what we think is possible...",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [],
          "user_mentions": []
        },
        "url": {
          "urls": [
            {
              "display_url": "readmultiplex.com",
              "expanded_url": "http://readmultiplex.com/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/riCFzsOQbj"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/J0q4qkLI6B",
        "expanded_url": "https://twitter.com/BrianRoemmele/status/2036826341581185171/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": []
          },
          "orig": {
            "faces": []
          }
        },
        "id_str": "2036826337613430784",
        "indices": [
          280,
          303
        ],
        "media_key": "3_2036826337613430784",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARxEQsMym8AACgACHERCxB8a4JMAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABHERCwzKbwAAKAAIcRELEHxrgkwAA",
            "media_key": "3_2036826337613430784"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/HERCwzKbwAAymug.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 722,
              "w": 1290,
              "x": 0,
              "y": 0
            },
            {
              "h": 1290,
              "w": 1290,
              "x": 0,
              "y": 0
            },
            {
              "h": 1471,
              "w": 1290,
              "x": 0,
              "y": 0
            },
            {
              "h": 1638,
              "w": 819,
              "x": 369,
              "y": 0
            },
            {
              "h": 1638,
              "w": 1290,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1638,
          "width": 1290
        },
        "sizes": {
          "large": {
            "h": 1638,
            "w": 1290
          }
        },
        "type": "photo",
        "url": "https://t.co/J0q4qkLI6B"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "timestamps": [],
    "urls": [
      {
        "display_url": "le-wm.github.io",
        "expanded_url": "https://le-wm.github.io/",
        "indices": [
          3055,
          3078
        ],
        "url": "https://t.co/KhGR9LiIQZ"
      },
      {
        "display_url": "github.com/lucas-maes/le-…",
        "expanded_url": "https://github.com/lucas-maes/le-wm",
        "indices": [
          3094,
          3117
        ],
        "url": "https://t.co/s1lI9kevJS"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}