🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

Thinking LLMs How difficult is it to train LLMs to do explicit "thinking" before responding to questions or tasks? This work proposes a training method to equip LLMs with thinking abilities for general instruction-following without human-annotated data. It uses an iterative search and optimization procedure to explore thought generation which enables the model to learn without direct supervision. Thought candidates for each user instruction are scored with a judge model. Note that only the responses are evaluated by the Judge which determines the best and worst ones. Then the corresponding full outputs are used as chosen and rejected pairs for DPO (referred to as Thought Preference Optimization in this paper). This entails the full training process that involves multiple iterations. Overall, this is a simple yet very effective approach to incentivizing the model to generate its own thoughts without explicitly teaching it how to think. The authors also find that these Thinking LLMs are effective even in problems that often don't rely on reasoning or CoT methods.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "id": "",
      "type": "photo",
      "url": "https://pbs.twimg.com/media/GZ8eZQ8b0AIjtkp.jpg",
      "media_url": "https://pbs.twimg.com/media/GZ8eZQ8b0AIjtkp.jpg",
      "filename": "media_0.jpg"
    }
  ],
  "nlp": {
    "sentiment": "neutral",
    "processed_at": "2025-08-06T12:58:27.463565"
  },
  "score": 1.0,
  "score_components": {
    "author": 0.27,
    "engagement": 0.1441830717576809,
    "quality": 0.16000000000000003,
    "source": 0.09,
    "nlp": 0.1,
    "recency": 0.020000000000000004
  },
  "scored_at": "2025-08-09T13:46:07.548065",
  "import_source": "network_archive_import",
  "source_tagged_at": "2025-08-09T13:43:17.512357",
  "enriched": true,
  "enriched_at": "2025-08-09T13:43:17.512361",
  "original_structure": "had_media_only",
  "enhanced_from_raw_response": true,
  "enhanced_at": "2025-08-13T17:10:00Z"
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2015-09-04T12:59:26.000Z",
    "default_profile_image": false,
    "description": "Building with AI Agents @dair_ai • Prev: Meta AI, Elastic, Galactica LLM, PhD • I also teach how to build with LLMs, RAG & AI Agents ⬇️",
    "fast_followers_count": 0,
    "favourites_count": 27933,
    "followers_count": 216711,
    "friends_count": 532,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 3689,
    "location": "",
    "media_count": 2656,
    "name": "elvis",
    "normal_followers_count": 216711,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "screen_name": "omarsar0",
    "statuses_count": 12439,
    "translator_type": "regular",
    "url": "https://t.co/JBU5beHQNs",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "3448284313"
  },
  "id": "1846227797972603047",
  "conversation_id": "1846227797972603047",
  "full_text": "Thinking LLMs\n\nHow difficult is it to train LLMs to do explicit \"thinking\" before responding to questions or tasks?\n\nThis work proposes a training method to equip LLMs with thinking abilities for general instruction-following without human-annotated data.\n\nIt uses an iterative search and optimization procedure to explore thought generation which enables the model to learn without direct supervision.\n\nThought candidates for each user instruction are scored with a judge model. Note that only the responses are evaluated by the Judge which determines the best and worst ones.\n\nThen the corresponding full outputs are used as chosen and rejected pairs for DPO (referred to as Thought Preference Optimization in this paper). This entails the full training process that involves multiple iterations.\n\nOverall, this is a simple yet very effective approach to incentivizing the model to generate its own thoughts without explicitly teaching it how to think. The authors also find that these Thinking LLMs are effective even in problems that often don't rely on reasoning or CoT methods.",
  "reply_count": 10,
  "retweet_count": 126,
  "favorite_count": 512,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GZ8eZQ8b0AIjtkp.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/omarsar0/status/1846227797972603047",
  "created_at": "2024-10-15T16:33:02.000Z",
  "#sort_index": "1846227797972603047",
  "view_count": 72556,
  "quote_count": 5,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/omarsar0/status/1846227797972603047"
}