🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

This is the first report I've seen where out-of-the-box o1 CoT reasoning is significantly outperformed on code generation. They propose AlphaCodium which acts as a strategy provider (i.e., flow engineering) on top of o1 to fuel and guide o1's chain-of-thought capabilities. AlphaCodium is built as a multi-stage flow improving on reasoning and reliability, while significantly elevating o1-preview's performance (55% --> 78% pass@5 accuracy on the Codeforces benchmark). Author's quote: "The promise of AlphaCodium is clear: with the right strategic flow-engineering, foundational models like o1 can be nudged toward System II thinking. We still have work to do to cross the gap from “System 1.5” to a true System 2-level AI, but when observing tools like AlphaCodium, we can better understand the gap and keep researching to get closer." o1 is a significantly better model than most LLMs out there today. However, it can benefit from strategic guidance as shown in this report. I am also working on a similar approach for knowledge-intensive tasks. From my own analysis, it does feel like o1 has better knowledge of complex tasks but still shows limitations to complex knowledge understanding and reasoning. More on this soon.

View on Twitter

🔧 Raw API Response

{
  "user": {
    "created_at": "2015-09-04T12:59:26.000Z",
    "default_profile_image": false,
    "description": "Building with AI Agents @dair_ai • Prev: Meta AI, Elastic, Galactica LLM, PhD • I also teach how to build with LLMs, RAG & AI Agents ⬇️",
    "fast_followers_count": 0,
    "favourites_count": 27933,
    "followers_count": 216711,
    "friends_count": 532,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 3689,
    "location": "",
    "media_count": 2656,
    "name": "elvis",
    "normal_followers_count": 216711,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "screen_name": "omarsar0",
    "statuses_count": 12439,
    "translator_type": "regular",
    "url": "https://t.co/JBU5beHQNs",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "3448284313"
  },
  "id": "1845869689911820334",
  "conversation_id": "1845869689911820334",
  "full_text": "This is the first report I've seen where out-of-the-box o1 CoT reasoning is significantly outperformed on code generation.\n\nThey propose AlphaCodium which acts as a strategy provider (i.e., flow engineering) on top of o1 to fuel and guide o1's chain-of-thought capabilities.\n\nAlphaCodium is built as a multi-stage flow improving on reasoning and reliability, while significantly elevating o1-preview's performance (55% --> 78% pass@5 accuracy on the Codeforces benchmark).\n\nAuthor's quote: \"The promise of AlphaCodium is clear: with the right strategic flow-engineering, foundational models like o1 can be nudged toward System II thinking. We still have work to do to cross the gap from “System 1.5” to a true System 2-level AI, but when observing tools like AlphaCodium, we can better understand the gap and keep researching to get closer.\"\n\no1 is a significantly better model than most LLMs out there today. However, it can benefit from strategic guidance as shown in this report. I am also working on a similar approach for knowledge-intensive tasks. From my own analysis, it does feel like o1 has better knowledge of complex tasks but still shows limitations to complex knowledge understanding and reasoning. More on this soon.",
  "reply_count": 8,
  "retweet_count": 56,
  "favorite_count": 323,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GZ3Ysikb0AIXfPf.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/omarsar0/status/1845869689911820334",
  "created_at": "2024-10-14T16:50:03.000Z",
  "#sort_index": "1845869689911820334",
  "view_count": 35081,
  "quote_count": 3,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/omarsar0/status/1845869689911820334"
}