🐦 Twitter Post Details

Viewing enriched Twitter post

@DrJimFan

Let's talk about which papers *actually* show a hint of LLM's internal world models. There're quite a few, but I'll highlight 2 in game AI. 1. Voyager (shameless self-plug). In Minecraft, Voyager is able to make decisions by world modeling. Example: "hunger bar is low -> if I don't get food soon I'll die -> I see a cat, a pig, and a villager nearby -> which one should I hunt? -> pig, because killing the other 2 wouldn't give me food even if I succeed -> check inventory, no good weapon -> [go craft stone sword] -> ugh pig ran away -> [start hunting sheep]" This trace of thought involves counterfactual reasoning and active intervention given the agent and the world's current state. Voyager anticipates what it needs by mentally simulating the future, and plan against that "imagined future" accordingly. It does extensive exploration and acquires new embodied skills along the way via the skill library mechanism. It makes mistakes but adjusts course of action to avoid them in the future. Now this fits perfectly with @ylecun's characterization. We did not mention world models in the paper, but now I think we should have. I'll update Arxiv accordingly. https://t.co/1d3YocozsI 2. Othello-GPT: https://t.co/VcKbmKDPG2. This is a much simpler game than Minecraft, but it shows that LLM can develop a world model of the game by training on histories of game moves. The model has no a priori knowledge of the game rules. Now you can use it to answer questions like "what would the opponent do had I made a different move?", or "is this move legal given the current world state?". The authors also discuss an intervention technique that suggests that the world model can be used to control the network’s behavior.

MineDojo

Voyager | An Open-Ended Embodied Agent with Large Language Models

Voyager is the first LLM-powered lifelong learning agent in Minecraft, capable of self-driven exploration and skill acquisition without human interven...

• Utilizes GPT-4 for exploration and skill development.

• Achieves 3.3x more unique items in Minecraft.

arXiv

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

This article investigates whether language models rely on internal representations or just memorize statistics by applying a GPT variant to Othello....

• Explores internal representations in language models.

• Uses a GPT model for predicting moves in Othello.

📊 Media Metadata

{
  "media": [
    {
      "id": "",
      "type": "video",
      "url": null,
      "media_url": "https://pbs.twimg.com/ext_tw_video_thumb/1710320425950150656/pu/img/BH0J7gOCzK0njxDt.jpg",
      "media_url_https": null,
      "display_url": null,
      "expanded_url": null
    }
  ],
  "nlp": {
    "processed_at": "2025-08-06T12:40:35.529054",
    "sentiment": "positive",
    "topics": [
      "LLMs",
      "AI Applications",
      "Reinforcement Learning",
      "NLP"
    ],
    "ner": {
      "entities": [
        {
          "entity": "Voyager",
          "type": "paper"
        },
        {
          "entity": "Minecraft",
          "type": "game"
        },
        {
          "entity": "hunger bar",
          "type": "concept"
        },
        {
          "entity": "cat",
          "type": "entity"
        },
        {
          "entity": "pig",
          "type": "entity"
        }
      ]
    }
  },
  "score": 1.0,
  "scored_at": "2025-08-09T13:46:07.541926",
  "import_source": "manual_curation_2023",
  "score_components": {
    "author": 0.09,
    "engagement": 0.13162286460923622,
    "quality": 0.2,
    "source": 0.15,
    "nlp": 0.1,
    "recency": 0.010000000000000002
  },
  "source_tagged_at": "2025-08-09T13:42:51.823203",
  "enriched": true,
  "enriched_at": "2025-08-09T13:42:51.823204",
  "enriched_links": [
    {
      "url": "https://t.co/1d3YocozsI",
      "title": "Voyager | An Open-Ended Embodied Agent with Large Language Models",
      "description": "Voyager is the first LLM-powered lifelong learning agent in Minecraft, capable of self-driven exploration and skill acquisition without human intervention.",
      "content_type": "article",
      "author": "Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi 'Jim' Fan, Anima Anandkumar",
      "site_name": "MineDojo",
      "image_url": null,
      "key_points": [
        "Utilizes GPT-4 for exploration and skill development.",
        "Achieves 3.3x more unique items in Minecraft.",
        "Employs an automatic curriculum for maximizing exploration."
      ],
      "enriched_at": "2025-08-10T10:18:32.699125"
    },
    {
      "url": "https://t.co/VcKbmKDPG2",
      "title": "Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task",
      "description": "This article investigates whether language models rely on internal representations or just memorize statistics by applying a GPT variant to Othello.",
      "content_type": "article",
      "author": "Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg",
      "site_name": "arXiv",
      "image_url": null,
      "key_points": [
        "Explores internal representations in language models.",
        "Uses a GPT model for predicting moves in Othello.",
        "Introduces 'latent saliency maps' for explanation."
      ],
      "enriched_at": "2025-08-10T10:18:36.890177"
    }
  ],
  "llm_enriched": true,
  "llm_enriched_at": "2025-08-10T10:18:36.890227",
  "original_structure": "had_media_only"
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2012-12-12T22:11:27.000Z",
    "default_profile_image": false,
    "description": "@NVIDIA Senior AI Scientist. @Stanford PhD. Join me on the frontier of AI Agents, LLM & Robotics. MineDojo (NeurIPS Best Paper), Voyager. Ex: @OpenAI, @GoogleAI",
    "fast_followers_count": 0,
    "favourites_count": 6203,
    "followers_count": 142744,
    "friends_count": 2804,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 2880,
    "location": "Views my own. Get in touch →",
    "media_count": 638,
    "name": "Jim Fan",
    "normal_followers_count": 142744,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1007413134/1672408318",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1554922493101559808/SYSZhbcd_normal.jpg",
    "screen_name": "DrJimFan",
    "statuses_count": 2957,
    "translator_type": "none",
    "url": "https://t.co/H4rXo4Ei8X",
    "verified": false,
    "withheld_in_countries": [],
    "id_str": "1007413134"
  },
  "id": "1710325869213020336",
  "conversation_id": "1710325869213020336",
  "full_text": "Let's talk about which papers *actually* show a hint of LLM's internal world models. There're quite a few, but I'll highlight 2 in game AI.\n\n1. Voyager (shameless self-plug). In Minecraft, Voyager is able to make decisions by world modeling. Example: \"hunger bar is low \n-> if I don't get food soon I'll die \n-> I see a cat, a pig, and a villager nearby \n-> which one should I hunt? \n-> pig, because killing the other 2 wouldn't give me food even if I succeed \n-> check inventory, no good weapon \n-> [go craft stone sword] \n-> ugh pig ran away \n-> [start hunting sheep]\"\n\nThis trace of thought involves counterfactual reasoning and active intervention given the agent and the world's current state. Voyager anticipates what it needs by mentally simulating the future, and plan against that \"imagined future\" accordingly. It does extensive exploration and acquires new embodied skills along the way via the skill library mechanism. It makes mistakes but adjusts course of action to avoid them in the future. Now this fits perfectly with @ylecun's characterization.\n\nWe did not mention world models in the paper, but now I think we should have. I'll update Arxiv accordingly.\n\nhttps://t.co/1d3YocozsI\n \n2. Othello-GPT: https://t.co/VcKbmKDPG2. This is a much simpler game than Minecraft, but it shows that LLM can develop a world model of the game by training on histories of game moves. The model has no a priori knowledge of the game rules.\n\nNow you can use it to answer questions like \"what would the opponent do had I made a different move?\", or \"is this move legal given the current world state?\". The authors also discuss an intervention technique that suggests that the world model can be used to control the network’s behavior.",
  "reply_count": 14,
  "retweet_count": 65,
  "favorite_count": 298,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/ext_tw_video_thumb/1710320425950150656/pu/img/BH0J7gOCzK0njxDt.jpg",
      "type": "video",
      "video_url": "https://video.twimg.com/ext_tw_video/1710320425950150656/pu/vid/avc1/1280x720/QKd6X4aRiU8f8IsB.mp4?tag=12"
    }
  ],
  "url": "https://twitter.com/DrJimFan/status/1710325869213020336",
  "created_at": "2023-10-06T16:07:18.000Z",
  "#sort_index": "1710325869213020336",
  "view_count": 118352,
  "quote_count": 2,
  "is_quote_tweet": true,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "quoted_tweet": {
    "user": {
      "created_at": "2012-12-12T22:11:27.000Z",
      "default_profile_image": false,
      "description": "@NVIDIA Senior AI Scientist. @Stanford PhD. Join me on the frontier of AI Agents, LLM & Robotics. MineDojo (NeurIPS Best Paper), Voyager. Ex: @OpenAI, @GoogleAI",
      "fast_followers_count": 0,
      "favourites_count": 6203,
      "followers_count": 142744,
      "friends_count": 2804,
      "has_custom_timelines": true,
      "is_translator": false,
      "listed_count": 2880,
      "location": "Views my own. Get in touch →",
      "media_count": 638,
      "name": "Jim Fan",
      "normal_followers_count": 142744,
      "possibly_sensitive": false,
      "profile_banner_url": "https://pbs.twimg.com/profile_banners/1007413134/1672408318",
      "profile_image_url_https": "https://pbs.twimg.com/profile_images/1554922493101559808/SYSZhbcd_normal.jpg",
      "screen_name": "DrJimFan",
      "statuses_count": 2957,
      "translator_type": "none",
      "url": "https://t.co/H4rXo4Ei8X",
      "verified": false,
      "withheld_in_countries": [],
      "id_str": "1007413134"
    },
    "id": "1709947595525951787",
    "conversation_id": "1709947595525951787",
    "full_text": "A viral paper \"Language Model Represents Space and Time\" recently claims that LLMs learn \"world models\". As much as I like @tegmark's works, I disagree with their definition of world model.\n\nWorld model is a core concept in AI agent and decision making. It is our mental simulation of how the world works given interventions (or lack thereof). \n\nA world model captures causality and intuitive physics, telling the agent what is likely and what is impossible. It can and should be used for counterfactual reasoning, i.e. \"what ifs\": what would happen if I knock over a cup of water? Where would I have been if I had not taken that bus?\n\nYann LeCun @ylecun says it well in his position paper (https://t.co/MJxLffbK5Q). I quote:\n\n\"Using such world models, animals can learn new skills with very few trials. They can predict the consequences of their actions, they can reason, plan, explore, and imagine new solutions to problems. Importantly, they can also avoid making dangerous mistakes when facing an unknown situation.\"\n\nThe first use of the term World Model in deep policy learning is attributed to @hardmaru & @SchmidhuberAI:  https://t.co/tWDuQRNTRh. In their seminal paper, an agent masters shooting skills in the popular game Doom (demo below) by learning in imagination, using an internal world model as a \"physics simulator\".\n\nTo put in a simple Python math formula, world model learns a function F(s[0:t-1], a) -> s[t:], which takes as input the observed past and current action, and outputs plausible future states.\n\nNow the definition of World Model in Tegmark's paper seems to be about predicting GPS coordinates and time eras. I see this as just a classification task with no causal learning and simulation going on. You cannot make meaningful interventions against that model, nor can you optimize any decision making in a closed feedback loop.\n\nAs for the \"space & time neurons\", I think they are most similar to the \"sentiment neuron\" that OpenAI published in 2017: https://t.co/QFnP2pjUSQ. Predicting GPS is conceptually no different from predicting sentiment in my opinion. I don't think their experimental results are wrong - just that their conclusion is on shaky grounds.\n\nI welcome any debate! Paper link: https://t.co/4ly12nPS1N",
    "reply_count": 74,
    "retweet_count": 213,
    "favorite_count": 1219,
    "hashtags": [],
    "symbols": [],
    "user_mentions": [
      {
        "id_str": "2530947115",
        "name": "Max Tegmark",
        "screen_name": "tegmark",
        "profile": "https://twitter.com/tegmark"
      }
    ],
    "urls": [],
    "media": [
      {
        "media_url": "https://pbs.twimg.com/ext_tw_video_thumb/1709942174044065792/pu/img/SIYXjIFHPTooRg18.jpg",
        "type": "video",
        "video_url": "https://video.twimg.com/ext_tw_video/1709942174044065792/pu/vid/avc1/320x240/2dZo_BJVzHtMmXW_.mp4?tag=12"
      }
    ],
    "url": "https://twitter.com/DrJimFan/status/1709947595525951787",
    "created_at": "2023-10-05T15:04:10.000Z",
    "#sort_index": "1710325869213020400",
    "view_count": 523433,
    "quote_count": 37,
    "is_quote_tweet": false,
    "is_retweet": false,
    "is_pinned": false,
    "is_truncated": true
  },
  "startUrl": "https://twitter.com/drjimfan/status/1710325869213020336"
}