🐦 Twitter Post Details

Viewing enriched Twitter post

@rao2z

On the Stone Soup of LLM Reasoning #SundayHarangue Stone soup is the European folk story where some clever travelers convince the gullible locals that they are making delicious soup with a stone--and they do need a few things to "improve its flavor"--such as carrots, potatoes, onions, butter etc.. There is nothing ipso facto wrong with Stone Soup--it is, after all, soup! It may even be delicious! The question instead is how much credit should the stone get for the soup's delciousness. The version of Stone Soup in AGI/AI circles are claims that LLMs can reason and plan--with just a few things to "improve the flavor". These needed things needed can range from external tools/verifiers etc (as in LLM-Modulo), to tacking on search on top of LLMs (the tree of thoughts), to tacking on a Mu_zero/Alpha_go like RL component that influences pretraining as well as inference stage (the 🍓 o1 model; see https://t.co/5nivQhAXFB). The question is whether the "augmented" systems are LLMs or some other qualitatively different beasts better called LRMs (cf. https://t.co/CvHuWhlKNj). Although it has become fashionable to equate LLMs to AI, and ask "When will AI Reason?"--the fact of the matter is that we have always had AI systems--planning, RL etc.--capable of reasoning. It can be argued that much of AI before LLMs was in fact was deep and narrow System 2 approximators (and thus https://t.co/m0Ecnvnx6E and https://t.co/NcZaqR2QzH). The appeal of LLMs is that they are the first effective System 1 approximators that AI managed to develop (c.f. https://t.co/UsnxSYaeKU). There have been several attempts to get System 2 reasoning behaviors from LLMs while keeping their essential autoregressive System 1 nature intact. These early attempts--such as "fine tuning", "Chain of thought" etc.--have by now been shown to be deeply flawed (c.f. https://t.co/DEjF9gLR8q & https://t.co/NE0MizcWN7). In other words, no soup from them! #Seinfeld In contrast, the approaches that tack on search/RL etc. at the inference stage seem to be more promising (c.f https://t.co/RqRf4fWjrU). But these "compound" LRM systems are no longer autoregressive LLMs and don't have any "start completing the prompt as soon you hit return" characteristic that is such a big part of LLM popularity! In particular, if the LRM is taking indefinite and costly inference time compute, the right comparison will be to other System 2 AI approaches that incur such inference time costs. Such considerations bring some of the traditional CS analyses--all but forgotten when online computation was given up for dead (c.f. https://t.co/I9Ya3j2EER)--back into play (see https://t.co/RqRf4fWjrU & https://t.co/JeZ8L0ryia). Interestingly, you can also err assuming that LRMs will inherit all the limitations of LLMs (something that the analyses that clump, for example, o1 with autoregressive LLMs mistakenly make).. They don't! Stone Soup is more than Stone--even if it may not be the best way to make soup! Specifically, many of the common critiques of LLM reasoning capabilities don't directly apply to LRMs. Not recognizing this and adding LRMs like o1 as yet another entry in a table brimming with autoregressive LLMs confuses the message. (For example, if you look at the appendix of that latest Apple study on LLM reasoning, you will see that o1 does quite fine--accuracy-wise--on their instances with irrelevant information.) Another argument for separating their analysis--as we do (c.f. https://t.co/3BUmLS7kTr).

View on Twitter

📊 Media Metadata

{
  "data": [
    {
      "id": "",
      "type": "photo",
      "url": null,
      "media_url": "https://pbs.twimg.com/media/GZzp7G9XcAArGHC.jpg",
      "media_url_https": null,
      "display_url": null,
      "expanded_url": null
    }
  ],
  "score": 0.86,
  "scored_at": "2025-08-09T13:46:07.550313",
  "import_source": "network_archive_import",
  "links_checked": true,
  "checked_at": "2025-08-10T10:32:44.076520",
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1845607153580838979/media_0.jpg?",
      "filename": "media_0.jpg"
    }
  ],
  "reprocessed_at": "2025-08-12T15:25:26.168052",
  "reprocessed_reason": "missing_media_array"
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2014-10-30T02:20:55.000Z",
    "default_profile_image": false,
    "description": "AI researcher & teacher @SCAI_ASU.  Former President of @RealAAAI; Chair of @AAAS Sec T. Here to tweach #AI. YouTube Ch: https://t.co/4beUPOmf6y Bsky: rao2z",
    "fast_followers_count": 0,
    "favourites_count": 3016,
    "followers_count": 22327,
    "friends_count": 55,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 412,
    "location": "Tempe, AZ",
    "media_count": 3085,
    "name": "Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)",
    "normal_followers_count": 22327,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/2850858010/1570418159",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1240088892751007745/zFdWaIFe_normal.jpg",
    "screen_name": "rao2z",
    "statuses_count": 9577,
    "translator_type": "none",
    "url": "https://t.co/xrAOxXwdJd",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "2850858010"
  },
  "id": "1845607153580838979",
  "conversation_id": "1845607153580838979",
  "full_text": "On the Stone Soup of LLM Reasoning  #SundayHarangue \n\nStone soup is the European folk story where some clever travelers convince the gullible locals that they are making delicious soup with a stone--and they do need a few things to \"improve its flavor\"--such as carrots, potatoes, onions, butter etc.. \n\nThere is nothing ipso facto wrong with Stone Soup--it is, after all, soup! It may even be delicious! The question instead is how much credit should the stone get for the soup's delciousness.\n\nThe version of Stone Soup in AGI/AI circles are claims that LLMs can reason and plan--with just a few things to \"improve the flavor\". \n\nThese needed things needed can range from external tools/verifiers etc (as in LLM-Modulo), to tacking on search on top of LLMs (the tree of thoughts), to tacking on a Mu_zero/Alpha_go like RL component that influences pretraining as well as inference stage (the 🍓 o1 model; see https://t.co/5nivQhAXFB). The question is whether the \"augmented\" systems are LLMs or some other qualitatively different beasts better called LRMs (cf. https://t.co/CvHuWhlKNj). \n\nAlthough it has become fashionable to equate LLMs to AI, and ask  \"When will AI Reason?\"--the fact of the matter is that we have always had AI systems--planning, RL etc.--capable of reasoning. It can be argued that much of AI before LLMs was in fact was deep and narrow System 2 approximators (and thus https://t.co/m0Ecnvnx6E and https://t.co/NcZaqR2QzH). \n\nThe appeal of LLMs is that they are the first effective System 1 approximators that AI managed to develop (c.f. https://t.co/UsnxSYaeKU). \n\nThere have been several attempts to get System 2 reasoning behaviors from LLMs while keeping their essential autoregressive System 1 nature intact. These early attempts--such as \"fine tuning\", \"Chain of thought\" etc.--have by now been shown to be deeply flawed (c.f. https://t.co/DEjF9gLR8q & https://t.co/NE0MizcWN7). In other words, no soup from them!  #Seinfeld \n\nIn contrast, the approaches that tack on search/RL etc. at the inference stage seem to be more promising (c.f  https://t.co/RqRf4fWjrU). But these \"compound\" LRM systems are no longer autoregressive LLMs and don't have any \"start completing the prompt as soon you hit return\" characteristic that is such a big part of LLM popularity!\n\nIn particular, if the LRM is taking indefinite and costly inference time compute, the right comparison will be to other System 2 AI approaches that incur such inference time costs.  Such considerations bring some of the traditional CS analyses--all but forgotten when online computation was given up for dead (c.f. https://t.co/I9Ya3j2EER)--back into play (see https://t.co/RqRf4fWjrU & https://t.co/JeZ8L0ryia). \n\nInterestingly, you can also err assuming that LRMs will  inherit all the limitations of LLMs (something that the analyses that clump, for example, o1 with autoregressive LLMs mistakenly make).. They don't!  Stone Soup is more than Stone--even if it may not be the best way to make soup!  \n\nSpecifically, many of the common critiques of LLM reasoning capabilities don't directly apply to LRMs. Not recognizing this and adding LRMs like o1 as yet another entry in a table brimming with autoregressive LLMs confuses the message.  (For example, if you look at the appendix of that latest Apple study on LLM reasoning, you will see that o1 does quite fine--accuracy-wise--on their instances with irrelevant information.) Another argument for separating their analysis--as we do (c.f. https://t.co/3BUmLS7kTr).",
  "reply_count": 11,
  "retweet_count": 52,
  "favorite_count": 253,
  "hashtags": [
    "SundayHarangue"
  ],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GZzp7G9XcAArGHC.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/rao2z/status/1845607153580838979",
  "created_at": "2024-10-13T23:26:49.000Z",
  "#sort_index": "1845607153580838979",
  "view_count": 61843,
  "quote_count": 13,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/rao2z/status/1845607153580838979"
}