🐦 Twitter Post Details

Viewing enriched Twitter post

@random_walker

📢 1) We have a few papers that advance the state of the art of AI agent evaluation. Details and links in Stephan's post. 2) AI agent evaluation has quickly become a distinct discipline. We're working on a paper titled "Emerging trends in AI agent evaluation" that extracts best practices for this community. 3) I'm giving an invited talk at ICML, addressing anxiety about supposedly imminent Recursive Self Improvement and the question of what will remain for humans to work on (especially scientists, researchers, software engineers). I hope to make it provocative but cautiously optimistic. https://t.co/rYHlxPGEXY (I also plan to share the ideas from the talk as essays on the AI as Normal Technology newsletter.)

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2072375245969719374/media_0.png",
      "filename": "media_0.png"
    }
  ],
  "processed_at": "2026-07-01T20:01:02.720412",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2072375245969719374",
  "url": "https://x.com/random_walker/status/2072375245969719374",
  "twitterUrl": "https://twitter.com/random_walker/status/2072375245969719374",
  "text": "📢 1) We have a few papers that advance the state of the art of AI agent evaluation. Details and links in Stephan's post.\n\n2) AI agent evaluation has quickly become a distinct discipline. We're working on a paper titled \"Emerging trends in AI agent evaluation\" that extracts best practices for this community.\n\n3) I'm giving an invited talk at ICML, addressing anxiety about supposedly imminent Recursive Self Improvement and the question of what will remain for humans to work on (especially scientists, researchers, software engineers). I hope to make it provocative but cautiously optimistic. https://t.co/rYHlxPGEXY\n(I also plan to share the ideas from the talk as essays on the AI as Normal Technology newsletter.)",
  "source": "Twitter for iPhone",
  "retweetCount": 3,
  "replyCount": 2,
  "likeCount": 32,
  "quoteCount": 0,
  "viewCount": 4986,
  "createdAt": "Wed Jul 01 17:42:16 +0000 2026",
  "lang": "en",
  "bookmarkCount": 33,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2072375245969719374",
  "displayTextRange": [
    0,
    278
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "random_walker",
    "url": "https://x.com/random_walker",
    "twitterUrl": "https://twitter.com/random_walker",
    "id": "10834752",
    "name": "Arvind Narayanan",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1650881612756942850/bZYjMyFU_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/10834752/1488663432",
    "description": "",
    "location": "Princeton, NJ",
    "followers": 127218,
    "following": 550,
    "status": "",
    "canDm": false,
    "canMediaTag": false,
    "createdAt": "Tue Dec 04 11:14:14 +0000 2007",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 24610,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 929,
    "statusesCount": 13208,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2065032543724785924"
    ],
    "profile_bio": {
      "description": "Princeton CS prof and Director @PrincetonCITP. \nCoauthor of \"AI Snake Oil\" and \"AI as Normal Technology\". https://t.co/ZwebetjZ4n\nViews mine.",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "normaltech.ai",
              "expanded_url": "https://www.normaltech.ai/",
              "indices": [
                106,
                129
              ],
              "url": "https://t.co/ZwebetjZ4n"
            }
          ],
          "user_mentions": [
            {
              "id_str": "",
              "indices": [
                31,
                45
              ],
              "name": "",
              "screen_name": "PrincetonCITP"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "cs.princeton.edu/~arvindn/",
              "expanded_url": "https://www.cs.princeton.edu/~arvindn/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/px6fpS9QFq"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "allow_download_status": {
          "allow_download": true
        },
        "display_url": "pic.twitter.com/Hv5EApZ7eM",
        "expanded_url": "https://twitter.com/random_walker/status/2072375245969719374/photo/1",
        "ext_alt_text": "Emerging trends in AI agent evaluation\nAbstract\nAI evaluation used to be relatively straightforward: researchers created benchmarks and AI developers reported accuracy figures on selected benchmarks. With the rise of agents, this approach has broken down in three ways. First, the complexity of their behavior has necessitated many improvements to measurement methodology, scientific rigor, and evaluation infrastructure. Second, their deployment in economically relevant, sometimes high-stakes scenarios has expanded the scope of evaluation from the lab to the field. Finally, evaluation has emerged as a distinct community, with independence from the AI labs as a key goal. We survey about 30 trends across the above three dimensions—the “How,” “What,” and “Who” of AI agent evaluation. Our goal is to consolidate best practices and identify areas where evaluation practices have not yet caught up.\n",
        "ext_master_playlist_only": [],
        "ext_media_availability": {
          "status": "Available"
        },
        "ext_playlists": [],
        "features": {
          "large": {
            "faces": []
          },
          "orig": {
            "faces": []
          }
        },
        "id_str": "2072373377046605824",
        "indices": [
          279,
          302
        ],
        "media_key": "3_2072373377046605824",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARzCjJtE1vAACgACHMKOTmlWgE4AAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABHMKMm0TW8AAKAAIcwo5OaVaATgAA",
            "media_key": "3_2072373377046605824"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/HMKMm0TW8AA37Qe.png",
        "original_info": {
          "focus_rects": [
            {
              "h": 678,
              "w": 1211,
              "x": 0,
              "y": 0
            },
            {
              "h": 678,
              "w": 678,
              "x": 0,
              "y": 0
            },
            {
              "h": 678,
              "w": 595,
              "x": 0,
              "y": 0
            },
            {
              "h": 678,
              "w": 339,
              "x": 0,
              "y": 0
            },
            {
              "h": 678,
              "w": 1258,
              "x": 0,
              "y": 0
            }
          ],
          "height": 678,
          "width": 1258
        },
        "sizes": {
          "large": {
            "h": 678,
            "w": 1258
          }
        },
        "type": "photo",
        "url": "https://t.co/Hv5EApZ7eM"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "icml.cc/virtual/2026/i…",
        "expanded_url": "https://icml.cc/virtual/2026/invited-talk/67274",
        "indices": [
          595,
          618
        ],
        "url": "https://t.co/rYHlxPGEXY"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": {
    "type": "tweet",
    "id": "2072314958570803267",
    "url": "https://x.com/steverab/status/2072314958570803267",
    "twitterUrl": "https://twitter.com/steverab/status/2072314958570803267",
    "text": "📣 I'll be in Seoul next week to present one main conference paper and four workshop papers at ICML! I'll also be on a panel at the https://t.co/D3wwI18H7o alignment workshop! Reach out if you are around and want to chat about uncertainty, reliability, or AI evals!😊\n\nDetails⬇️\n\n📄Paper 1: Towards a Science of AI Agent Reliability\n📍Main conference: Thursday (July 9) • 14:30–16:15 in Hall A • Poster #3408\n📍Workshop on Failure Modes in Agentic AI (FAGEN): Friday (July 10) • 10:10–11:00 and 14:40–15:30 in Grand Ballroom 104-105\n🔗https://t.co/HAKHzASrOZ\n🧵https://t.co/uQCpPIiXSJ\n\n📄Paper 2: Log Analysis is Necessary for Credible Evaluation of AI Agents\n📍Workshop on Failure Modes in Agentic AI (FAGEN): Friday (July 10) • 10:10–11:00 and 14:40–15:30 in Grand Ballroom 104-105\n🔗https://t.co/2xKsB4oMaU\n🧵https://t.co/StcdxiRuXi\n\n📄Paper 3: Open-World Evaluations for Measuring Frontier AI Capabilities\n📍Workshop on Agents in the Wild (AIWILD): Saturday (July 11) • 11:10–12:00 and 16:10–17:00 in Hall B2\n🔗https://t.co/nq9iJtBGLs\n🧵https://t.co/tTblfaNqld\n\n📄Paper 4: Life After Benchmark Saturation: A Case Study of CORE-Bench\n📍Workshop on Agents in the Wild (AIWILD): Saturday (July 11) • 11:10–12:00 and 16:10–17:00 in Hall B2\n🔗https://t.co/NtEyYrSlF9\n🧵https://t.co/w7Pphsd6ko\n\n🗣️Panel on the AI capability–reliability gap\n📍https://t.co/D3wwI18H7o Seoul Alignment Workshop: Monday (July 6)\n🔗https://t.co/iBxqhTQmVf\n\nAlso, my advisor @random_walker is going to deliver a keynote on Thursday (July 9) at 13:30 in Hall C: https://t.co/qAO4ZjhZxX. Don't miss it!",
    "source": "Twitter for iPhone",
    "retweetCount": 2,
    "replyCount": 2,
    "likeCount": 13,
    "quoteCount": 0,
    "viewCount": 5565,
    "createdAt": "Wed Jul 01 13:42:42 +0000 2026",
    "lang": "en",
    "bookmarkCount": 4,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "2072314958570803267",
    "displayTextRange": [
      0,
      276
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "steverab",
      "url": "https://x.com/steverab",
      "twitterUrl": "https://twitter.com/steverab",
      "id": "138821636",
      "name": "Stephan Rabanser",
      "isVerified": false,
      "isBlueVerified": false,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/1928214170547159040/VekssmRX_normal.jpg",
      "coverPicture": "https://pbs.twimg.com/profile_banners/138821636/1760330288",
      "description": "",
      "location": "Princeton, NJ",
      "followers": 798,
      "following": 385,
      "status": "",
      "canDm": true,
      "canMediaTag": false,
      "createdAt": "Fri Apr 30 18:05:23 +0000 2010",
      "entities": {
        "description": {
          "urls": []
        },
        "url": {}
      },
      "fastFollowersCount": 0,
      "favouritesCount": 490,
      "hasCustomTimelines": true,
      "isTranslator": false,
      "mediaCount": 1050,
      "statusesCount": 10128,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [
        "2062890225144135800"
      ],
      "profile_bio": {
        "description": "Postdoctoral Researcher @Princeton. Reliable, safe, trustworthy machine learning. Previously: @UofT @VectorInst @TU_Muenchen @Google @awscloud",
        "entities": {
          "description": {
            "user_mentions": [
              {
                "id_str": "",
                "indices": [
                  24,
                  34
                ],
                "name": "",
                "screen_name": "Princeton"
              },
              {
                "id_str": "",
                "indices": [
                  94,
                  99
                ],
                "name": "",
                "screen_name": "UofT"
              },
              {
                "id_str": "",
                "indices": [
                  100,
                  111
                ],
                "name": "",
                "screen_name": "VectorInst"
              },
              {
                "id_str": "",
                "indices": [
                  112,
                  124
                ],
                "name": "",
                "screen_name": "TU_Muenchen"
              },
              {
                "id_str": "",
                "indices": [
                  125,
                  132
                ],
                "name": "",
                "screen_name": "Google"
              },
              {
                "id_str": "",
                "indices": [
                  133,
                  142
                ],
                "name": "",
                "screen_name": "awscloud"
              }
            ]
          },
          "url": {
            "urls": [
              {
                "display_url": "rabanser.dev",
                "expanded_url": "https://rabanser.dev",
                "indices": [
                  0,
                  23
                ],
                "url": "https://t.co/cNNItOKNEM"
              }
            ]
          }
        }
      },
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {
      "media": [
        {
          "allow_download_status": {
            "allow_download": true
          },
          "display_url": "pic.twitter.com/VlaNO1FP5M",
          "expanded_url": "https://twitter.com/steverab/status/2072314958570803267/photo/1",
          "ext_master_playlist_only": [],
          "ext_media_availability": {
            "status": "Available"
          },
          "ext_playlists": [],
          "features": {
            "large": {
              "faces": [
                {
                  "h": 192,
                  "w": 192,
                  "x": 1088,
                  "y": 752
                },
                {
                  "h": 368,
                  "w": 368,
                  "x": 1502,
                  "y": 778
                }
              ]
            },
            "orig": {
              "faces": [
                {
                  "h": 384,
                  "w": 384,
                  "x": 2176,
                  "y": 1504
                },
                {
                  "h": 736,
                  "w": 736,
                  "x": 3004,
                  "y": 1556
                }
              ]
            }
          },
          "id_str": "2072309046351101952",
          "indices": [
            277,
            300
          ],
          "media_key": "3_2072309046351101952",
          "media_results": {
            "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARzCUhkclgAACgACHMJXeahXUEMAAA==",
            "result": {
              "__typename": "ApiMedia",
              "id": "QXBpTWVkaWE6DAABCgABHMJSGRyWAAAKAAIcwld5qFdQQwAA",
              "media_key": "3_2072309046351101952"
            }
          },
          "media_url_https": "https://pbs.twimg.com/media/HMJSGRyWAAAKrxv.jpg",
          "original_info": {
            "focus_rects": [
              {
                "h": 2294,
                "w": 4096,
                "x": 0,
                "y": 388
              },
              {
                "h": 2725,
                "w": 2725,
                "x": 1371,
                "y": 0
              },
              {
                "h": 2725,
                "w": 2390,
                "x": 1706,
                "y": 0
              },
              {
                "h": 2725,
                "w": 1363,
                "x": 2733,
                "y": 0
              },
              {
                "h": 2725,
                "w": 4096,
                "x": 0,
                "y": 0
              }
            ],
            "height": 2725,
            "width": 4096
          },
          "sizes": {
            "large": {
              "h": 1363,
              "w": 2048
            }
          },
          "type": "photo",
          "url": "https://t.co/VlaNO1FP5M"
        }
      ]
    },
    "card": null,
    "place": {},
    "entities": {
      "hashtags": [],
      "symbols": [],
      "urls": [
        {
          "display_url": "FAR.AI",
          "expanded_url": "http://FAR.AI",
          "indices": [
            131,
            154
          ],
          "url": "https://t.co/D3wwI18H7o"
        },
        {
          "display_url": "arxiv.org/abs/2602.16666",
          "expanded_url": "https://arxiv.org/abs/2602.16666",
          "indices": [
            529,
            552
          ],
          "url": "https://t.co/HAKHzASrOZ"
        },
        {
          "display_url": "x.com/steverab/statu…",
          "expanded_url": "https://x.com/steverab/status/2062890225144135800",
          "indices": [
            554,
            577
          ],
          "url": "https://t.co/uQCpPIiXSJ"
        },
        {
          "display_url": "arxiv.org/abs/2605.08545",
          "expanded_url": "https://arxiv.org/abs/2605.08545",
          "indices": [
            776,
            799
          ],
          "url": "https://t.co/2xKsB4oMaU"
        },
        {
          "display_url": "x.com/PKirgis/status…",
          "expanded_url": "https://x.com/PKirgis/status/2054368127677509985",
          "indices": [
            801,
            824
          ],
          "url": "https://t.co/StcdxiRuXi"
        },
        {
          "display_url": "arxiv.org/abs/2605.20520",
          "expanded_url": "https://arxiv.org/abs/2605.20520",
          "indices": [
            1001,
            1024
          ],
          "url": "https://t.co/nq9iJtBGLs"
        },
        {
          "display_url": "x.com/sayashk/status…",
          "expanded_url": "https://x.com/sayashk/status/2044835523198370272",
          "indices": [
            1026,
            1049
          ],
          "url": "https://t.co/tTblfaNqld"
        },
        {
          "display_url": "arxiv.org/abs/2606.26158",
          "expanded_url": "https://arxiv.org/abs/2606.26158",
          "indices": [
            1224,
            1247
          ],
          "url": "https://t.co/NtEyYrSlF9"
        },
        {
          "display_url": "x.com/nityndg/status…",
          "expanded_url": "https://x.com/nityndg/status/2072089528634900776",
          "indices": [
            1249,
            1272
          ],
          "url": "https://t.co/w7Pphsd6ko"
        },
        {
          "display_url": "FAR.AI",
          "expanded_url": "http://FAR.AI",
          "indices": [
            1320,
            1343
          ],
          "url": "https://t.co/D3wwI18H7o"
        },
        {
          "display_url": "far.ai/events/event-l…",
          "expanded_url": "https://www.far.ai/events/event-list/seoul-alignment-workshop-2026",
          "indices": [
            1387,
            1410
          ],
          "url": "https://t.co/iBxqhTQmVf"
        },
        {
          "display_url": "icml.cc/virtual/2026/i…",
          "expanded_url": "https://icml.cc/virtual/2026/invited-talk/67274",
          "indices": [
            1515,
            1538
          ],
          "url": "https://t.co/qAO4ZjhZxX"
        }
      ],
      "user_mentions": [
        {
          "id_str": "10834752",
          "indices": [
            1429,
            1443
          ],
          "name": "Arvind Narayanan",
          "screen_name": "random_walker"
        }
      ]
    },
    "quoted_tweet": {
      "type": "tweet",
      "id": "2072089528634900776",
      "url": "",
      "twitterUrl": "",
      "text": "",
      "source": "Twitter for iPhone",
      "retweetCount": 0,
      "replyCount": 0,
      "likeCount": 0,
      "quoteCount": 0,
      "viewCount": 0,
      "createdAt": "",
      "lang": "",
      "bookmarkCount": 0,
      "isReply": false,
      "inReplyToId": null,
      "conversationId": "",
      "displayTextRange": [],
      "inReplyToUserId": null,
      "inReplyToUsername": null,
      "author": {},
      "extendedEntities": {},
      "card": null,
      "place": {},
      "entities": {},
      "quoted_tweet": null,
      "retweeted_tweet": null,
      "isLimitedReply": false,
      "communityInfo": null,
      "article": null
    },
    "retweeted_tweet": null,
    "isLimitedReply": false,
    "communityInfo": null,
    "article": null
  },
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "communityInfo": null,
  "article": null
}