@s_batzoglou
@CharuruCha14310 I tried it with underwhelming results. See my previous post: https://t.co/DaKbeaVIcb It is insanely slow, with up to 22 hours per problem (π€―), fails a lot, and seems a low-performer on correct answers
Viewing enriched Twitter post
@CharuruCha14310 I tried it with underwhelming results. See my previous post: https://t.co/DaKbeaVIcb It is insanely slow, with up to 22 hours per problem (π€―), fails a lot, and seems a low-performer on correct answers
{
"media": [
{
"url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2073405777767747684/media_0.jpg",
"media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2073405777767747684/media_0.jpg",
"type": "photo",
"filename": "media_0.jpg"
}
],
"processed_at": "2026-07-04T14:00:46.610903",
"pipeline_version": "2.0"
} {
"type": "tweet",
"id": "2073405777767747684",
"url": "https://x.com/s_batzoglou/status/2073405777767747684",
"twitterUrl": "https://twitter.com/s_batzoglou/status/2073405777767747684",
"text": "@CharuruCha14310 I tried it with underwhelming results. See my previous post:\n\nhttps://t.co/DaKbeaVIcb\n\nIt is insanely slow, with up to 22 hours per problem (π€―), fails a lot, and seems a low-performer on correct answers",
"source": "Twitter for iPhone",
"retweetCount": 0,
"replyCount": 0,
"likeCount": 0,
"quoteCount": 0,
"viewCount": 2,
"createdAt": "Sat Jul 04 13:57:14 +0000 2026",
"lang": "en",
"bookmarkCount": 0,
"isReply": true,
"inReplyToId": "2073401830562767180",
"conversationId": "2073172179605156064",
"displayTextRange": [
17,
219
],
"inReplyToUserId": "1867983109326848001",
"inReplyToUsername": "CharuruCha14310",
"author": {
"type": "user",
"userName": "s_batzoglou",
"url": "https://x.com/s_batzoglou",
"twitterUrl": "https://twitter.com/s_batzoglou",
"id": "1518735949458378752",
"name": "Serafim Batzoglou",
"isVerified": false,
"isBlueVerified": true,
"verifiedType": null,
"profilePicture": "https://pbs.twimg.com/profile_images/1518736918527152128/hV7H_k58_normal.jpg",
"coverPicture": "https://pbs.twimg.com/profile_banners/1518735949458378752/1731329061",
"description": "",
"location": "San Francisco and Miami",
"followers": 3195,
"following": 866,
"status": "",
"canDm": false,
"canMediaTag": true,
"createdAt": "Mon Apr 25 23:37:36 +0000 2022",
"entities": {
"description": {
"urls": []
},
"url": {}
},
"fastFollowersCount": 0,
"favouritesCount": 47046,
"hasCustomTimelines": true,
"isTranslator": false,
"mediaCount": 305,
"statusesCount": 6334,
"withheldInCountries": [],
"affiliatesHighlightedLabel": {},
"possiblySensitive": false,
"pinnedTweetIds": [],
"profile_bio": {
"description": "Genomics-computation-ML-biotech-foundations of math-philosophy of mind; CDO @seer_bio; former prof @StanfordAILab; cofounder @dnanexus; opinions entirely my own",
"entities": {
"description": {
"user_mentions": [
{
"id_str": "",
"indices": [
76,
85
],
"name": "",
"screen_name": "seer_bio"
},
{
"id_str": "",
"indices": [
99,
113
],
"name": "",
"screen_name": "StanfordAILab"
},
{
"id_str": "",
"indices": [
125,
134
],
"name": "",
"screen_name": "dnanexus"
}
]
}
}
},
"isAutomated": false,
"automatedBy": null
},
"extendedEntities": {},
"card": null,
"place": {},
"entities": {
"urls": [
{
"display_url": "x.com/s_batzoglou/stβ¦",
"expanded_url": "https://x.com/s_batzoglou/status/2068297051247350154?s=20",
"indices": [
79,
102
],
"url": "https://t.co/DaKbeaVIcb"
}
],
"user_mentions": [
{
"id_str": "1867983109326848001",
"indices": [
0,
16
],
"name": "Charuru Charuru",
"screen_name": "CharuruCha14310"
}
]
},
"quoted_tweet": {
"type": "tweet",
"id": "2068297051247350154",
"url": "https://x.com/s_batzoglou/status/2068297051247350154",
"twitterUrl": "https://twitter.com/s_batzoglou/status/2068297051247350154",
"text": "I find GLM-5.2 currently unusable for hard reasoning tasks. I gave it 11 induction problems from my benchmark (ICML 2026, https://t.co/gBelIZQEaa).\n\n- 4 out of the 11 completed, the rest failed; 2 correct\n- Average time per completed problem: 6h 10m 13s\n- Average time per failed problem: 20h 54m 16s\n\nThe worst part: the total visible token usage is 96,026. But the charge to my account is $48.55. So it charged for about 10M output tokens. Which means that each problem was probably run more than once internally, and failed but still got charged.\n\nAt $12 per problem, GLM-5.2 is by far the most expensive model, compared to GPT-5.5 which is around $3 per problem in the same benchmark. And slower by 1-2 orders of magnitude.\n\nBy the way, Kimi-k2.7-code is great. Clear improvement over kimi-k2.6, but batch mode is not supported yet.",
"source": "Twitter for iPhone",
"retweetCount": 11,
"replyCount": 17,
"likeCount": 146,
"quoteCount": 6,
"viewCount": 22943,
"createdAt": "Sat Jun 20 11:36:58 +0000 2026",
"lang": "en",
"bookmarkCount": 56,
"isReply": false,
"inReplyToId": null,
"conversationId": "2068297051247350154",
"displayTextRange": [
0,
279
],
"inReplyToUserId": null,
"inReplyToUsername": null,
"author": {
"type": "user",
"userName": "s_batzoglou",
"url": "https://x.com/s_batzoglou",
"twitterUrl": "https://twitter.com/s_batzoglou",
"id": "1518735949458378752",
"name": "Serafim Batzoglou",
"isVerified": false,
"isBlueVerified": true,
"verifiedType": null,
"profilePicture": "https://pbs.twimg.com/profile_images/1518736918527152128/hV7H_k58_normal.jpg",
"coverPicture": "https://pbs.twimg.com/profile_banners/1518735949458378752/1731329061",
"description": "",
"location": "San Francisco and Miami",
"followers": 3195,
"following": 866,
"status": "",
"canDm": false,
"canMediaTag": true,
"createdAt": "Mon Apr 25 23:37:36 +0000 2022",
"entities": {
"description": {
"urls": []
},
"url": {}
},
"fastFollowersCount": 0,
"favouritesCount": 47046,
"hasCustomTimelines": true,
"isTranslator": false,
"mediaCount": 305,
"statusesCount": 6334,
"withheldInCountries": [],
"affiliatesHighlightedLabel": {},
"possiblySensitive": false,
"pinnedTweetIds": [],
"profile_bio": {
"description": "Genomics-computation-ML-biotech-foundations of math-philosophy of mind; CDO @seer_bio; former prof @StanfordAILab; cofounder @dnanexus; opinions entirely my own",
"entities": {
"description": {
"user_mentions": [
{
"id_str": "",
"indices": [
76,
85
],
"name": "",
"screen_name": "seer_bio"
},
{
"id_str": "",
"indices": [
99,
113
],
"name": "",
"screen_name": "StanfordAILab"
},
{
"id_str": "",
"indices": [
125,
134
],
"name": "",
"screen_name": "dnanexus"
}
]
}
}
},
"isAutomated": false,
"automatedBy": null
},
"extendedEntities": {},
"card": {
"binding_values": [
{
"key": "thumbnail_image",
"value": {
"image_value": {
"alt": "arXiv logo",
"height": 144,
"url": "https://pbs.twimg.com/card_img/2071828459626876928/a7x6lTiH?format=jpg&name=144x144_2",
"width": 144
}
}
},
{
"key": "description",
"value": {
"string_value": "We introduce INDUCTION, a benchmark for finite structure concept synthesis in first order logic. Given small finite relational worlds with extensionally labeled target predicates, models must..."
}
},
{
"key": "domain",
"value": {
"string_value": "arxiv.org"
}
},
{
"key": "thumbnail_image_large",
"value": {
"image_value": {
"alt": "arXiv logo",
"height": 420,
"url": "https://pbs.twimg.com/card_img/2071828459626876928/a7x6lTiH?format=jpg&name=420x420_2",
"width": 420
}
}
},
{
"key": "thumbnail_image_original",
"value": {
"image_value": {
"alt": "arXiv logo",
"height": 1000,
"url": "https://pbs.twimg.com/card_img/2071828459626876928/a7x6lTiH?format=jpg&name=orig",
"width": 1000
}
}
},
{
"key": "site",
"value": {
"scribe_key": "publisher_id",
"user_value": {
"id_str": "808633423300624384",
"path": []
}
}
},
{
"key": "thumbnail_image_small",
"value": {
"image_value": {
"alt": "arXiv logo",
"height": 100,
"url": "https://pbs.twimg.com/card_img/2071828459626876928/a7x6lTiH?format=jpg&name=100x100_2",
"width": 100
}
}
},
{
"key": "thumbnail_image_x_large",
"value": {
"image_value": {
"alt": "arXiv logo",
"height": 1000,
"url": "https://pbs.twimg.com/card_img/2071828459626876928/a7x6lTiH?format=png&name=2048x2048_2_exp",
"width": 1000
}
}
},
{
"key": "thumbnail_image_alt_text",
"value": {
"string_value": "arXiv logo"
}
},
{
"key": "vanity_url",
"value": {
"scribe_key": "vanity_url",
"string_value": "arxiv.org"
}
},
{
"key": "thumbnail_image_color",
"value": {
"image_color_value": {
"palette": [
{
"percentage": 94.17,
"rgb": {
"blue": 255,
"green": 255,
"red": 255
}
},
{
"percentage": 4.33,
"rgb": {
"blue": 105,
"green": 116,
"red": 124
}
},
{
"percentage": 1.26,
"rgb": {
"blue": 46,
"green": 21,
"red": 170
}
},
{
"percentage": 0.23,
"rgb": {
"blue": 131,
"green": 116,
"red": 203
}
}
]
}
}
},
{
"key": "title",
"value": {
"string_value": "INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic"
}
},
{
"key": "card_url",
"value": {
"scribe_key": "card_url",
"string_value": "https://t.co/V6Xs33jhMX"
}
}
],
"card_platform": {
"platform": {
"audience": {
"name": "production"
},
"device": {
"name": "iPhone",
"version": "13"
}
}
},
"name": "summary",
"url": "https://t.co/V6Xs33jhMX",
"user_refs_results": [
{
"rest_id": "808633423300624384",
"result": {
"__typename": "User",
"action_counts": {
"favorites_count": 983
},
"avatar": {
"image_url": "https://pbs.twimg.com/profile_images/1365352170267299840/IzvjKckL_normal.jpg"
},
"banner": {
"image_url": "https://pbs.twimg.com/profile_banners/808633423300624384/1481635469"
},
"core": {
"created_at": "Tue Dec 13 11:23:26 +0000 2016",
"name": "arXiv.org",
"screen_name": "arxiv"
},
"dm_permissions": {
"can_dm": true
},
"exclusive_tweet_following": false,
"follow_request_sent": false,
"identity_profile_labels_highlighted_label": {},
"location": {
"location": "Ithaca, NY",
"profile_location_place": {
"country": "",
"country_code": "",
"full_name": "Ithaca, NY",
"id": "ae76bffcaf2bf545",
"name": "Ithaca, NY",
"place_type": "unknown"
}
},
"media_permissions": {
"can_media_tag": true
},
"notifications_settings": {
"notifications_enabled": false
},
"pinned_items": {},
"privacy": {
"protected": false,
"suspended": false
},
"private_super_following": false,
"profile_bio": {
"description": "News from https://t.co/enurGFxpcS, a free distribution service and an open archive for scholarly articles.\n\nFor help with arXiv, see https://t.co/LcWuhM0BOl",
"entities": {
"description": {
"urls": [
{
"display_url": "arXiv.org",
"expanded_url": "http://arXiv.org",
"indices": [
10,
33
],
"url": "https://t.co/enurGFxpcS"
},
{
"display_url": "arxiv.org/help",
"expanded_url": "https://arxiv.org/help",
"indices": [
133,
156
],
"url": "https://t.co/LcWuhM0BOl"
}
]
},
"url": {
"urls": [
{
"display_url": "arxiv.org",
"expanded_url": "https://arxiv.org/",
"indices": [
0,
23
],
"url": "https://t.co/DHMkdi4lF9"
}
]
}
}
},
"profile_image_shape": "Circle",
"profile_metadata": {
"profile_interstitial_type": "",
"profile_link_color": "ABB8C2"
},
"profile_translation": {
"translator_type_enum": "None"
},
"properties": {
"has_extended_profile": true,
"has_no_screen_name": false
},
"relationship_counts": {
"followers": 49205,
"following": 184
},
"relationship_perspectives": {
"blocked_by": false,
"blocking": false,
"followed_by": false,
"following": false,
"live_following": false,
"muting": false
},
"rest_id": "808633423300624384",
"smart_blocked_by": false,
"smart_blocking": false,
"super_follow_eligible": false,
"super_followed_by": false,
"super_following": false,
"tweet_counts": {
"media_tweets": 118,
"tweets": 1131
},
"verification": {
"is_blue_verified": false,
"verified": false
},
"website": {
"url": "https://t.co/DHMkdi4lF9"
}
}
}
]
},
"place": {},
"entities": {
"hashtags": [],
"symbols": [],
"urls": [
{
"display_url": "arxiv.org/abs/2602.18956",
"expanded_url": "https://arxiv.org/abs/2602.18956",
"indices": [
122,
145
],
"url": "https://t.co/gBelIZQEaa"
}
],
"user_mentions": []
},
"quoted_tweet": null,
"retweeted_tweet": null,
"isLimitedReply": false,
"communityInfo": null,
"article": null
},
"retweeted_tweet": null,
"isLimitedReply": false,
"communityInfo": null,
"article": null
}