🐦 Twitter Post Details

Viewing enriched Twitter post

@rohanpaul_ai

New @AIatMeta builds a vision language world model that turns videos into text plans and reasons to pick better actions. 27% higher Elo for system-2 planning over system-1. The gap it tackles, agents must predict how actions change the world rather than only label frames. VLWM, the Vision Language World Model, represents the hidden state in plain language, predicting a goal and interleaved actions with their state changes. Training targets come from a Tree of Captions that compresses each video, then an LLM refines them into goals and state updates. The model jointly learns a policy to propose the next action and a dynamics model to predict the next state. In fast mode it completes the plan text left to right, which is quick but can lock in early mistakes. In reflective mode it searches candidate plans, rolls out futures, and picks the lowest cost path. The critic that supplies this cost is trained without labels by ranking valid progress below distractors or shuffled steps. Across planning benchmarks and human head to head comparisons, reflective search produces cleaner, more reliable plans. ---- Paper – arxiv. org/abs/2509.02722 Paper Title: "Planning with Reasoning using Vision Language World Model"

Media 1

šŸ“Š Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1967197438701408764/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1967197438701408764/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-09-18T13:56:56.792637",
  "pipeline_version": "2.0"
}

šŸ”§ Raw API Response

{
  "type": "tweet",
  "id": "1967197438701408764",
  "url": "https://x.com/rohanpaul_ai/status/1967197438701408764",
  "twitterUrl": "https://twitter.com/rohanpaul_ai/status/1967197438701408764",
  "text": "New @AIatMeta  builds a vision language world model that turns videos into text plans and reasons to pick better actions. \n\n27% higher Elo for system-2 planning over system-1.\n\nThe gap it tackles, agents must predict how actions change the world rather than only label frames.\n\nVLWM, the Vision Language World Model, represents the hidden state in plain language, predicting a goal and interleaved actions with their state changes.\n\nTraining targets come from a Tree of Captions that compresses each video, then an LLM refines them into goals and state updates.\n\nThe model jointly learns a policy to propose the next action and a dynamics model to predict the next state.\n\nIn fast mode it completes the plan text left to right, which is quick but can lock in early mistakes.\n\nIn reflective mode it searches candidate plans, rolls out futures, and picks the lowest cost path.\n\nThe critic that supplies this cost is trained without labels by ranking valid progress below distractors or shuffled steps.\n\nAcross planning benchmarks and human head to head comparisons, reflective search produces cleaner, more reliable plans.\n\n----\n\nPaper – arxiv. org/abs/2509.02722\n\nPaper Title: \"Planning with Reasoning using Vision Language World Model\"",
  "source": "Twitter for iPhone",
  "retweetCount": 90,
  "replyCount": 15,
  "likeCount": 476,
  "quoteCount": 3,
  "viewCount": 70524,
  "createdAt": "Sun Sep 14 12:03:12 +0000 2025",
  "lang": "en",
  "bookmarkCount": 336,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1967197438701408764",
  "displayTextRange": [
    0,
    277
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "rohanpaul_ai",
    "url": "https://x.com/rohanpaul_ai",
    "twitterUrl": "https://twitter.com/rohanpaul_ai",
    "id": "2588345408",
    "name": "Rohan Paul",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1816185267037859840/Fd18CH0v_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/2588345408/1729559315",
    "description": "",
    "location": "Ex Inv Banking (Deutsche)",
    "followers": 94974,
    "following": 8316,
    "status": "",
    "canDm": true,
    "canMediaTag": false,
    "createdAt": "Wed Jun 25 22:38:54 +0000 2014",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 47600,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 21730,
    "statusesCount": 54134,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1965551636082032917"
    ],
    "profile_bio": {
      "description": "Compiling in real-time, the race towards AGI.\n\nThe Largest Show on X for AI.\n\nšŸ—žļø Don't miss my daily AI analysis newsletter šŸ‘‰ https://t.co/6LBxO8215l",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "rohan-paul.com",
              "expanded_url": "https://www.rohan-paul.com",
              "indices": [
                126,
                149
              ],
              "url": "https://t.co/6LBxO8215l"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "rohan-paul.com",
              "expanded_url": "http://www.rohan-paul.com",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/2NKnK0xg7T"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/1mXuxT9VZQ",
        "expanded_url": "https://twitter.com/rohanpaul_ai/status/1967197438701408764/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "1967197179388592128",
        "indices": [
          278,
          301
        ],
        "media_key": "3_1967197179388592128",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARtM42l4m4AACgACG0zjpdjbEfwAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABG0zjaXibgAAKAAIbTOOl2NsR/AAA",
            "media_key": "3_1967197179388592128"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/G0zjaXibgAAsoPV.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 533,
              "w": 952,
              "x": 0,
              "y": 0
            },
            {
              "h": 638,
              "w": 638,
              "x": 0,
              "y": 0
            },
            {
              "h": 638,
              "w": 560,
              "x": 0,
              "y": 0
            },
            {
              "h": 638,
              "w": 319,
              "x": 55,
              "y": 0
            },
            {
              "h": 638,
              "w": 952,
              "x": 0,
              "y": 0
            }
          ],
          "height": 638,
          "width": 952
        },
        "sizes": {
          "large": {
            "h": 638,
            "w": 952
          }
        },
        "type": "photo",
        "url": "https://t.co/1mXuxT9VZQ"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "user_mentions": [
      {
        "id_str": "1034844617261248512",
        "indices": [
          4,
          13
        ],
        "name": "AI at Meta",
        "screen_name": "AIatMeta"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}