🐦 Twitter Post Details

Viewing enriched Twitter post

@s_batzoglou

OK, Fable 5 is VERY strong in my first small benchmark test. I tested the following models on a reasoning task, induction. (Details in my manuscript on arXiv appearing in ICML.) 50 challenge problems, to keep the task manageable in terms of costs. Fable 5 blows the competition. Caveat: it has a high rate of empty responses. At thinking effort high, it returns almost all empty (and bills max tokens). At medium, it returns more than half empty. So I did two rounds on medium, and then one on low effort and reached 45/50 responses. (The whole task cost $188 for 50 problems.) Regarding the GPT models: interestingly, GPT-5.5 is pathological in not returning answers. I ran two rounds of it on xhigh and two rounds on high. The completion rates respectively are 9/50 and 17/50, and the correct answers are extremely low, much worse performance than GPT-5.4 and GPT-5.2. So I won't be running any more experiments with GPT-5.5 on this task. (It is strong on other tasks.) Another note, on Grok models: the original, and now unavailable Grok 4, is very strong. Again with low completion rate. I ran about 3-4 rounds to get 25/50. Grok 4.3 is much weaker in comparison (even weaker than Grok 4.1 fast) but returns answers more often. Other notably strong performers are Gemini 3.5 Flash (way better than Gemini 3.1 Pro) and DeepSeek v4 Pro. But no model matches Fable 5. Great job, @anthropic!

Media 1

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2073172179605156064/media_0.png",
      "filename": "media_0.png"
    }
  ],
  "processed_at": "2026-07-03T23:00:49.603517",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2073172179605156064",
  "url": "https://x.com/s_batzoglou/status/2073172179605156064",
  "twitterUrl": "https://twitter.com/s_batzoglou/status/2073172179605156064",
  "text": "OK, Fable 5 is VERY strong in my first small benchmark test.\n\nI tested the following models on a reasoning task, induction. (Details in my manuscript on arXiv appearing in ICML.) 50 challenge problems, to keep the task manageable in terms of costs.\n\nFable 5 blows the competition. Caveat: it has a high rate of empty responses. At thinking effort high, it returns almost all empty (and bills max tokens). At medium, it returns more than half empty. So I did two rounds on medium, and then one on low effort and reached 45/50 responses. (The whole task cost $188 for 50 problems.)\n\nRegarding the GPT models: interestingly, GPT-5.5 is pathological in not returning answers. I ran two rounds of it on xhigh and two rounds on high. The completion rates respectively are 9/50 and 17/50, and the correct answers are extremely low, much worse performance than GPT-5.4 and GPT-5.2. So I won't be running any more experiments with GPT-5.5 on this task. (It is strong on other tasks.)\n\nAnother note, on Grok models: the original, and now unavailable Grok 4, is very strong. Again with low completion rate. I ran about 3-4 rounds to get 25/50. Grok 4.3 is much weaker in comparison (even weaker than Grok 4.1 fast) but returns answers more often.\n\nOther notably strong performers are Gemini 3.5 Flash (way better than Gemini 3.1 Pro) and DeepSeek v4 Pro.\n\nBut no model matches Fable 5. Great job, @anthropic!",
  "source": "Twitter for iPhone",
  "retweetCount": 0,
  "replyCount": 0,
  "likeCount": 0,
  "quoteCount": 0,
  "viewCount": 83,
  "createdAt": "Fri Jul 03 22:28:59 +0000 2026",
  "lang": "en",
  "bookmarkCount": 0,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2073172179605156064",
  "displayTextRange": [
    0,
    280
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "s_batzoglou",
    "url": "https://x.com/s_batzoglou",
    "twitterUrl": "https://twitter.com/s_batzoglou",
    "id": "1518735949458378752",
    "name": "Serafim Batzoglou",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1518736918527152128/hV7H_k58_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1518735949458378752/1731329061",
    "description": "",
    "location": "San Francisco and Miami",
    "followers": 3188,
    "following": 866,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Mon Apr 25 23:37:36 +0000 2022",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 47015,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 305,
    "statusesCount": 6330,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [],
    "profile_bio": {
      "description": "Genomics-computation-ML-biotech-foundations of math-philosophy of mind; CDO @seer_bio; former prof @StanfordAILab; cofounder @dnanexus; opinions entirely my own",
      "entities": {
        "description": {
          "user_mentions": [
            {
              "id_str": "",
              "indices": [
                76,
                85
              ],
              "name": "",
              "screen_name": "seer_bio"
            },
            {
              "id_str": "",
              "indices": [
                99,
                113
              ],
              "name": "",
              "screen_name": "StanfordAILab"
            },
            {
              "id_str": "",
              "indices": [
                125,
                134
              ],
              "name": "",
              "screen_name": "dnanexus"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/DvnZxLsWSK",
        "expanded_url": "https://twitter.com/s_batzoglou/status/2073172179605156064/photo/1",
        "ext_master_playlist_only": [],
        "ext_media_availability": {
          "status": "Available"
        },
        "ext_playlists": [],
        "features": {
          "large": {
            "faces": []
          },
          "orig": {
            "faces": []
          }
        },
        "id_str": "2073170411089440769",
        "indices": [
          281,
          304
        ],
        "media_key": "3_2073170411089440769",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARzFYYE31jABCgACHMVjHPuWcOAAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABHMVhgTfWMAEKAAIcxWMc+5Zw4AAA",
            "media_key": "3_2073170411089440769"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/HMVhgTfWMAEN-ah.png",
        "original_info": {
          "focus_rects": [
            {
              "h": 645,
              "w": 1152,
              "x": 0,
              "y": 386
            },
            {
              "h": 1152,
              "w": 1152,
              "x": 0,
              "y": 132
            },
            {
              "h": 1313,
              "w": 1152,
              "x": 0,
              "y": 52
            },
            {
              "h": 1666,
              "w": 833,
              "x": 0,
              "y": 0
            },
            {
              "h": 1666,
              "w": 1152,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1666,
          "width": 1152
        },
        "sizes": {
          "large": {
            "h": 1666,
            "w": 1152
          }
        },
        "type": "photo",
        "url": "https://t.co/DvnZxLsWSK"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [],
    "user_mentions": [
      {
        "id_str": "1819261",
        "indices": [
          1386,
          1396
        ],
        "name": "Paul Jankura",
        "screen_name": "anthropic"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "communityInfo": null,
  "article": null
}