@arankomatsuzaki
RT @tikgiau: Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find…
Viewing enriched Twitter post
RT @tikgiau: Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find…
{
"score": 0.34,
"score_components": {
"author": 0.09,
"engagement": 0.0,
"quality": 0.04000000000000001,
"source": 0.135,
"nlp": 0.05,
"recency": 0.025
},
"scored_at": "2026-07-03T12:00:53.209820",
"import_source": "api_import",
"source_tagged_at": "2026-07-03T12:00:53.209834",
"enriched": true,
"enriched_at": "2026-07-03T12:00:53.209837"
} {
"type": "tweet",
"id": "2073000974747349095",
"url": "https://x.com/arankomatsuzaki/status/2073000974747349095",
"twitterUrl": "https://twitter.com/arankomatsuzaki/status/2073000974747349095",
"text": "RT @tikgiau: Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find…",
"source": "Twitter for iPhone",
"retweetCount": 116,
"replyCount": 33,
"likeCount": 622,
"quoteCount": 45,
"viewCount": 159522,
"createdAt": "Fri Jul 03 11:08:41 +0000 2026",
"lang": "en",
"bookmarkCount": 422,
"isReply": false,
"inReplyToId": null,
"conversationId": "2073000974747349095",
"displayTextRange": [
0,
140
],
"inReplyToUserId": null,
"inReplyToUsername": null,
"author": {
"type": "user",
"userName": "arankomatsuzaki",
"url": "https://x.com/arankomatsuzaki",
"twitterUrl": "https://twitter.com/arankomatsuzaki",
"id": "794433401591693312",
"name": "Aran Komatsuzaki",
"isVerified": false,
"isBlueVerified": true,
"verifiedType": null,
"profilePicture": "https://pbs.twimg.com/profile_images/1561220982328754176/JOYS5kab_normal.jpg",
"coverPicture": "",
"description": "",
"location": "",
"followers": 181680,
"following": 375,
"status": "",
"canDm": true,
"canMediaTag": true,
"createdAt": "Fri Nov 04 06:57:37 +0000 2016",
"entities": {
"description": {
"urls": []
},
"url": {}
},
"fastFollowersCount": 0,
"favouritesCount": 16426,
"hasCustomTimelines": true,
"isTranslator": false,
"mediaCount": 2557,
"statusesCount": 6782,
"withheldInCountries": [],
"affiliatesHighlightedLabel": {},
"possiblySensitive": false,
"pinnedTweetIds": [
"2035363139299242192"
],
"profile_bio": {
"description": "Sharing AI research. Early work on AI (GPT-J, scaling, MoE). Ex ML PhD (GT) & Google.",
"entities": {
"description": {},
"url": {
"urls": [
{
"display_url": "arankomatsuzaki.wordpress.com/about-me/",
"expanded_url": "https://arankomatsuzaki.wordpress.com/about-me/",
"indices": [
0,
23
],
"url": "https://t.co/aZGCShojNY"
}
]
}
}
},
"isAutomated": false,
"automatedBy": null
},
"extendedEntities": {},
"card": null,
"place": {},
"entities": {
"user_mentions": [
{
"id_str": "1170330869275529216",
"indices": [
3,
11
],
"name": "Deyao Zhu",
"screen_name": "tikgiau"
}
]
},
"quoted_tweet": null,
"retweeted_tweet": {
"type": "tweet",
"id": "2072701593829695926",
"url": "https://x.com/tikgiau/status/2072701593829695926",
"twitterUrl": "https://twitter.com/tikgiau/status/2072701593829695926",
"text": "Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find that performance follows a log-sigmoid function of environment interaction time with high precision.\n\nEdgeBench is built with three ingredients:\n\n- 🌍 Real & Diverse: 134 real-world tasks across 6 task categories, spanning scientific problems, professional knowledge work, software engineering, optimization, formal math, and games. \n- ⏳ Ultra-Long-Horizon: Each task supports 12–72 hours of agent work. Recorded human effort averages 57.2 hours. \n- 🔁 Informative Feedback: Agents receive real-world feedback for continuous improvement.\n\nAfter 38,000 hours of agent runs on EdgeBench, a scaling law for learning from environments emerges:\n\n- 📈 As agents interact with task environments over time, their aggregate performance is precisely fit by a log-sigmoid function. \n- 🧠 This phenomenon can be explained by an elegant theory of graph exploration.\n\nWe are releasing an initial 51 of the 134 tasks, together with the full evaluation framework, to help advance long-horizon agent research. Check our blog & paper for more findings!\n\nBlog https://t.co/nMOzFsOhbT\nPaper https://t.co/rZb3eWuvik\nGitHub https://t.co/oemXd4UrFw\nDataset https://t.co/P4SQMrM47o\n\nDetails below 👇🧵",
"source": "Twitter for iPhone",
"retweetCount": 116,
"replyCount": 33,
"likeCount": 622,
"quoteCount": 45,
"viewCount": 159522,
"createdAt": "Thu Jul 02 15:19:03 +0000 2026",
"lang": "en",
"bookmarkCount": 422,
"isReply": false,
"inReplyToId": null,
"conversationId": "2072701593829695926",
"displayTextRange": [
0,
276
],
"inReplyToUserId": null,
"inReplyToUsername": null,
"author": {
"type": "user",
"userName": "tikgiau",
"url": "https://x.com/tikgiau",
"twitterUrl": "https://twitter.com/tikgiau",
"id": "1170330869275529216",
"name": "Deyao Zhu",
"isVerified": false,
"isBlueVerified": true,
"verifiedType": null,
"profilePicture": "https://pbs.twimg.com/profile_images/2072668922848157696/SL4AxUzw_normal.jpg",
"coverPicture": "https://pbs.twimg.com/profile_banners/1170330869275529216/1642856611",
"description": "",
"location": "",
"followers": 914,
"following": 540,
"status": "",
"canDm": false,
"canMediaTag": true,
"createdAt": "Sat Sep 07 13:40:03 +0000 2019",
"entities": {
"description": {
"urls": []
},
"url": {}
},
"fastFollowersCount": 0,
"favouritesCount": 1206,
"hasCustomTimelines": true,
"isTranslator": false,
"mediaCount": 17,
"statusesCount": 95,
"withheldInCountries": [],
"affiliatesHighlightedLabel": {},
"possiblySensitive": false,
"pinnedTweetIds": [
"2072701593829695926"
],
"profile_bio": {
"description": "Reseach Scientist at ByteDance Seed Edge @ByteDanceTalk. PhD @KAUST_news Prev. @MPI_IS. RL, Multimodel LLM, Learning from Experience, Self Evolving.",
"entities": {
"description": {
"user_mentions": [
{
"id_str": "",
"indices": [
41,
55
],
"name": "",
"screen_name": "ByteDanceTalk"
},
{
"id_str": "",
"indices": [
61,
72
],
"name": "",
"screen_name": "KAUST_news"
},
{
"id_str": "",
"indices": [
79,
86
],
"name": "",
"screen_name": "MPI_IS"
}
]
},
"url": {
"urls": [
{
"display_url": "tsutikgiau.github.io",
"expanded_url": "http://tsutikgiau.github.io",
"indices": [
0,
23
],
"url": "https://t.co/BpXdkqUhxc"
}
]
}
}
},
"isAutomated": false,
"automatedBy": null
},
"extendedEntities": {
"media": [
{
"additional_media_info": {
"monetizable": false
},
"allow_download_status": {
"allow_download": true
},
"display_url": "pic.twitter.com/5mzq7JCaax",
"expanded_url": "https://twitter.com/tikgiau/status/2072701593829695926/video/1",
"ext_master_playlist_only": [],
"ext_media_availability": {
"status": "Available"
},
"ext_playlists": [],
"id_str": "2072701380314406912",
"indices": [
277,
300
],
"media_key": "13_2072701380314406912",
"media_results": {
"id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARzDtux7GiAAAAA=",
"result": {
"__typename": "ApiMedia",
"id": "QXBpTWVkaWE6DAAECgABHMO27HsaIAAAAA==",
"media_key": "13_2072701380314406912"
}
},
"media_url_https": "https://pbs.twimg.com/amplify_video_thumb/2072701380314406912/img/dCa4e2ADjJalMOn0.jpg",
"original_info": {
"focus_rects": [],
"height": 1440,
"width": 2560
},
"sizes": {
"large": {
"h": 1152,
"w": 2048
}
},
"type": "video",
"url": "https://t.co/5mzq7JCaax",
"video_info": {
"aspect_ratio": [
16,
9
],
"duration_millis": 52000,
"variants": [
{
"content_type": "application/x-mpegURL",
"url": "https://video.twimg.com/amplify_video/2072701380314406912/pl/z8g_KPheOfwhbbj8.m3u8?tag=28&v=cfc"
},
{
"bitrate": 256000,
"content_type": "video/mp4",
"url": "https://video.twimg.com/amplify_video/2072701380314406912/vid/avc1/480x270/sZLc2_cqKHhgug29.mp4?tag=28"
},
{
"bitrate": 832000,
"content_type": "video/mp4",
"url": "https://video.twimg.com/amplify_video/2072701380314406912/vid/avc1/640x360/MccQC_d6bEiDL2oJ.mp4?tag=28"
},
{
"bitrate": 2176000,
"content_type": "video/mp4",
"url": "https://video.twimg.com/amplify_video/2072701380314406912/vid/avc1/1280x720/70YUt2sPNLaZ9iKS.mp4?tag=28"
},
{
"bitrate": 10368000,
"content_type": "video/mp4",
"url": "https://video.twimg.com/amplify_video/2072701380314406912/vid/avc1/2560x1440/266eQ7NuWF3RDCBr.mp4?tag=28"
}
]
}
}
]
},
"card": null,
"place": {},
"entities": {
"hashtags": [],
"symbols": [],
"timestamps": [],
"urls": [
{
"display_url": "edge-bench.org",
"expanded_url": "https://edge-bench.org/",
"indices": [
1164,
1187
],
"url": "https://t.co/nMOzFsOhbT"
},
{
"display_url": "edge-bench.org/paper.pdf",
"expanded_url": "https://edge-bench.org/paper.pdf",
"indices": [
1194,
1217
],
"url": "https://t.co/rZb3eWuvik"
},
{
"display_url": "github.com/ByteDance-Seed…",
"expanded_url": "https://github.com/ByteDance-Seed/EdgeBench",
"indices": [
1225,
1248
],
"url": "https://t.co/oemXd4UrFw"
},
{
"display_url": "huggingface.co/datasets/ByteD…",
"expanded_url": "https://huggingface.co/datasets/ByteDance-Seed/EdgeBench",
"indices": [
1257,
1280
],
"url": "https://t.co/P4SQMrM47o"
}
],
"user_mentions": []
},
"quoted_tweet": null,
"retweeted_tweet": null,
"isLimitedReply": false,
"communityInfo": null,
"article": null
},
"isLimitedReply": false,
"communityInfo": null,
"article": null
}