🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

🔥The competition for the best reasoning LLM intensifies! A few days ago, we had the Forge Reasoning API, now we have DeepSeek-R1-Lite-Preview which produces o1-preview-level performance on math benchmarks. Here are my observations after some initial tests on Deepseek’s new reasoning model. Math Capabilities: It looks effective for math reasoning problems. The benchmark results do reflect the potential of this model on math reasoning capabilities (even outperform o1-preview on their benchmarks). Something to watch very closely. Coding tasks: It wasn’t able to solve a simple code problem (generating bash script for transposing a matrix) which the o1 models solve easily. Complex knowledge understanding: I also tried the model on a much harder cross-word puzzle but it failed miserably. To be fair, even the o1 models fail on this particular test that requires knowledge of modern references. More thoughts and tests here: https://t.co/0rCPwkK2hz I believe the model is good at code and math as DeepSeek has been explicitly optimizing their models for this. But there is more work to do on the "reasoning" steps. In some instances, the model looks like it is able to correct itself when generating the thinking steps, displaying what looks like native self-reflection. Hard to confirm this without details on training data, architecture, and a technical report/paper. Looking forward to the open models and APIs.

View on Twitter

📊 Media Metadata

{
  "data": [
    {
      "media_url": "https://pbs.twimg.com/media/Gc3SFIUWoAA45ZQ.jpg",
      "type": "photo"
    }
  ],
  "score": 1.0,
  "scored_at": "2025-08-09T13:46:07.554269",
  "import_source": "network_archive_import",
  "links_checked": true,
  "checked_at": "2025-08-10T10:32:55.999712",
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1859373413439066590/media_0.jpg?",
      "filename": "media_0.jpg"
    }
  ],
  "reprocessed_at": "2025-08-12T15:26:41.865613",
  "reprocessed_reason": "missing_media_array"
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2015-09-04T12:59:26.000Z",
    "default_profile_image": false,
    "description": "Building with AI Agents @dair_ai • Prev: Meta AI, Elastic, Galactica LLM, PhD • I also teach how to build with LLMs, RAG & AI Agents ⬇️",
    "fast_followers_count": 0,
    "favourites_count": 27933,
    "followers_count": 216713,
    "friends_count": 532,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 3688,
    "location": "",
    "media_count": 2656,
    "name": "elvis",
    "normal_followers_count": 216713,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "screen_name": "omarsar0",
    "statuses_count": 12439,
    "translator_type": "regular",
    "url": "https://t.co/JBU5beHQNs",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "3448284313"
  },
  "id": "1859373413439066590",
  "conversation_id": "1859373413439066590",
  "full_text": "🔥The competition for the best reasoning LLM intensifies!\n\nA few days ago, we had the Forge Reasoning API, now we have DeepSeek-R1-Lite-Preview which produces o1-preview-level performance on math benchmarks.\n\nHere are my observations after some initial tests on Deepseek’s new reasoning model.\n\nMath Capabilities: It looks effective for math reasoning problems. The benchmark results do reflect the potential of this model on math reasoning capabilities (even outperform o1-preview on their benchmarks). Something to watch very closely.\n\nCoding tasks: It wasn’t able to solve a simple code problem (generating bash script for transposing a matrix) which the o1 models solve easily.\n\nComplex knowledge understanding: I also tried the model on a much harder cross-word puzzle but it failed miserably. To be fair, even the o1 models fail on this particular test that requires knowledge of modern references.\n\nMore thoughts and tests  here: https://t.co/0rCPwkK2hz\n\nI believe the model is good at code and math as DeepSeek has been explicitly optimizing their models for this. But there is more work to do on the \"reasoning\" steps.\n\nIn some instances, the model looks like it is able to correct itself when generating the thinking steps, displaying what looks like native self-reflection. Hard to confirm this without details on training data, architecture, and a technical report/paper.\n\nLooking forward to the open models and APIs.",
  "reply_count": 5,
  "retweet_count": 18,
  "favorite_count": 90,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/Gc3SFIUWoAA45ZQ.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/omarsar0/status/1859373413439066590",
  "created_at": "2024-11-20T23:09:01.000Z",
  "#sort_index": "1859373413439066590",
  "view_count": 11567,
  "quote_count": 1,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/omarsar0/status/1859373413439066590"
}