🐦 Twitter Post Details

Viewing enriched Twitter post

@random_walker

RT @JustinBullock14: Lots of important ideas here! “Evaluating 14 models on two complementary benchmarks, we found that nearly two years of rapid capability progress have produced only modest reliability gains… Unfortunately, AI agents are evaluated based on a single number, the average success rate at the task. That number has been going up quickly on many tasks over the last two years, which is why there’s so much excitement about deploying agents. Safety-critical engineering fields (aviation, nuclear, automotive) figured out decades ago that reliability is not the same as average performance. These fields independently converged on the above four dimensions: consistency, robustness, predictability, and safety (the frequency and severity of failures).”

View on Twitter

📊 Media Metadata

{
  "score": 0.36,
  "score_components": {
    "author": 0.09,
    "engagement": 0.0,
    "quality": 0.06000000000000001,
    "source": 0.135,
    "nlp": 0.05,
    "recency": 0.025
  },
  "scored_at": "2026-03-01T12:12:13.891744",
  "import_source": "api_import",
  "source_tagged_at": "2026-03-01T12:12:13.891760",
  "enriched": true,
  "enriched_at": "2026-03-01T12:12:13.891763"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2026715877064823140",
  "url": "https://x.com/random_walker/status/2026715877064823140",
  "twitterUrl": "https://twitter.com/random_walker/status/2026715877064823140",
  "text": "RT @JustinBullock14: Lots of important ideas here!\n\n“Evaluating 14 models on two complementary benchmarks, we found that nearly two years o…",
  "source": "Twitter for iPhone",
  "retweetCount": 5,
  "replyCount": 1,
  "likeCount": 8,
  "quoteCount": 0,
  "viewCount": 3231,
  "createdAt": "Wed Feb 25 17:48:14 +0000 2026",
  "lang": "en",
  "bookmarkCount": 4,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2026715877064823140",
  "displayTextRange": [
    0,
    140
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "random_walker",
    "url": "https://x.com/random_walker",
    "twitterUrl": "https://twitter.com/random_walker",
    "id": "10834752",
    "name": "Arvind Narayanan",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1650881612756942850/bZYjMyFU_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/10834752/1488663432",
    "description": "",
    "location": "Princeton, NJ",
    "followers": 126209,
    "following": 519,
    "status": "",
    "canDm": false,
    "canMediaTag": false,
    "createdAt": "Tue Dec 04 11:14:14 +0000 2007",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 23473,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 912,
    "statusesCount": 13041,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2026316087604687193"
    ],
    "profile_bio": {
      "description": "Princeton CS prof and Director @PrincetonCITP. \nCoauthor of \"AI Snake Oil\" and \"AI as Normal Technology\". https://t.co/ZwebetjZ4n\nViews mine.",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [
            {
              "display_url": "normaltech.ai",
              "expanded_url": "https://www.normaltech.ai/",
              "indices": [
                106,
                129
              ],
              "url": "https://t.co/ZwebetjZ4n"
            }
          ],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                31,
                45
              ],
              "name": "",
              "screen_name": "PrincetonCITP"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "cs.princeton.edu/~arvindn/",
              "expanded_url": "https://www.cs.princeton.edu/~arvindn/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/px6fpS9QFq"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "timestamps": [],
    "urls": [],
    "user_mentions": [
      {
        "id_str": "2933754365",
        "indices": [
          3,
          19
        ],
        "name": "Justin Bullock",
        "screen_name": "JustinBullock14"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": {
    "type": "tweet",
    "id": "2026693253169336475",
    "url": "https://x.com/JustinBullock14/status/2026693253169336475",
    "twitterUrl": "https://twitter.com/JustinBullock14/status/2026693253169336475",
    "text": "Lots of important ideas here!\n\n“Evaluating 14 models on two complementary benchmarks, we found that nearly two years of rapid capability progress have produced only modest reliability gains…\n\nUnfortunately, AI agents are evaluated based on a single number, the average success rate at the task. That number has been going up quickly on many tasks over the last two years, which is why there’s so much excitement about deploying agents.\n\nSafety-critical engineering fields (aviation, nuclear, automotive) figured out decades ago that reliability is not the same as average performance. These fields independently converged on the above four dimensions: consistency, robustness, predictability, and safety (the frequency and severity of failures).”",
    "source": "Twitter for iPhone",
    "retweetCount": 5,
    "replyCount": 1,
    "likeCount": 8,
    "quoteCount": 0,
    "viewCount": 3231,
    "createdAt": "Wed Feb 25 16:18:20 +0000 2026",
    "lang": "en",
    "bookmarkCount": 4,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "2026693253169336475",
    "displayTextRange": [
      0,
      276
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "JustinBullock14",
      "url": "https://x.com/JustinBullock14",
      "twitterUrl": "https://twitter.com/JustinBullock14",
      "id": "2933754365",
      "name": "Justin Bullock",
      "isVerified": false,
      "isBlueVerified": true,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/2002837207569051648/37Y1twBK_normal.jpg",
      "coverPicture": "https://pbs.twimg.com/profile_banners/2933754365/1749788710",
      "description": "",
      "location": "Earth",
      "followers": 1425,
      "following": 1592,
      "status": "",
      "canDm": false,
      "canMediaTag": true,
      "createdAt": "Sat Dec 20 15:27:34 +0000 2014",
      "entities": {
        "description": {
          "urls": []
        },
        "url": {}
      },
      "fastFollowersCount": 0,
      "favouritesCount": 22565,
      "hasCustomTimelines": true,
      "isTranslator": false,
      "mediaCount": 343,
      "statusesCount": 6308,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [
        "1999978711823860023"
      ],
      "profile_bio": {
        "description": "VP of Policy for @americans4ri; Senior Fellow with Convergence Analysis; Advocate of Love, Intelligence, & Freedom",
        "entities": {
          "description": {
            "hashtags": [],
            "symbols": [],
            "urls": [],
            "user_mentions": [
              {
                "id_str": "0",
                "indices": [
                  17,
                  30
                ],
                "name": "",
                "screen_name": "americans4ri"
              }
            ]
          },
          "url": {
            "urls": [
              {
                "display_url": "governingwithAI.com",
                "expanded_url": "http://governingwithAI.com",
                "indices": [
                  0,
                  23
                ],
                "url": "https://t.co/toGpnUfQ0s"
              }
            ]
          }
        }
      },
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {},
    "card": null,
    "place": {},
    "entities": {
      "hashtags": [],
      "symbols": [],
      "urls": [],
      "user_mentions": []
    },
    "quoted_tweet": {
      "type": "tweet",
      "id": "2026316087604687193",
      "url": "https://x.com/random_walker/status/2026316087604687193",
      "twitterUrl": "https://twitter.com/random_walker/status/2026316087604687193",
      "text": "https://t.co/16ak7tW7Z7",
      "source": "Twitter for iPhone",
      "retweetCount": 40,
      "replyCount": 12,
      "likeCount": 188,
      "quoteCount": 15,
      "viewCount": 83066,
      "createdAt": "Tue Feb 24 15:19:37 +0000 2026",
      "lang": "zxx",
      "bookmarkCount": 249,
      "isReply": false,
      "inReplyToId": null,
      "conversationId": "2026316087604687193",
      "displayTextRange": [
        0,
        23
      ],
      "inReplyToUserId": null,
      "inReplyToUsername": null,
      "author": {
        "type": "user",
        "userName": "random_walker",
        "url": "https://x.com/random_walker",
        "twitterUrl": "https://twitter.com/random_walker",
        "id": "10834752",
        "name": "Arvind Narayanan",
        "isVerified": false,
        "isBlueVerified": true,
        "verifiedType": null,
        "profilePicture": "https://pbs.twimg.com/profile_images/1650881612756942850/bZYjMyFU_normal.jpg",
        "coverPicture": "https://pbs.twimg.com/profile_banners/10834752/1488663432",
        "description": "",
        "location": "Princeton, NJ",
        "followers": 126209,
        "following": 519,
        "status": "",
        "canDm": false,
        "canMediaTag": false,
        "createdAt": "Tue Dec 04 11:14:14 +0000 2007",
        "entities": {
          "description": {
            "urls": []
          },
          "url": {}
        },
        "fastFollowersCount": 0,
        "favouritesCount": 23473,
        "hasCustomTimelines": true,
        "isTranslator": false,
        "mediaCount": 912,
        "statusesCount": 13041,
        "withheldInCountries": [],
        "affiliatesHighlightedLabel": {},
        "possiblySensitive": false,
        "pinnedTweetIds": [
          "2026316087604687193"
        ],
        "profile_bio": {
          "description": "Princeton CS prof and Director @PrincetonCITP. \nCoauthor of \"AI Snake Oil\" and \"AI as Normal Technology\". https://t.co/ZwebetjZ4n\nViews mine.",
          "entities": {
            "description": {
              "hashtags": [],
              "symbols": [],
              "urls": [
                {
                  "display_url": "normaltech.ai",
                  "expanded_url": "https://www.normaltech.ai/",
                  "indices": [
                    106,
                    129
                  ],
                  "url": "https://t.co/ZwebetjZ4n"
                }
              ],
              "user_mentions": [
                {
                  "id_str": "0",
                  "indices": [
                    31,
                    45
                  ],
                  "name": "",
                  "screen_name": "PrincetonCITP"
                }
              ]
            },
            "url": {
              "urls": [
                {
                  "display_url": "cs.princeton.edu/~arvindn/",
                  "expanded_url": "https://www.cs.princeton.edu/~arvindn/",
                  "indices": [
                    0,
                    23
                  ],
                  "url": "https://t.co/px6fpS9QFq"
                }
              ]
            }
          }
        },
        "isAutomated": false,
        "automatedBy": null
      },
      "extendedEntities": {},
      "card": null,
      "place": {},
      "entities": {
        "hashtags": [],
        "symbols": [],
        "timestamps": [],
        "urls": [
          {
            "display_url": "x.com/i/article/2026…",
            "expanded_url": "http://x.com/i/article/2026312913116360704",
            "indices": [
              0,
              23
            ],
            "url": "https://t.co/16ak7tW7Z7"
          }
        ],
        "user_mentions": []
      },
      "quoted_tweet": null,
      "retweeted_tweet": null,
      "isLimitedReply": false,
      "article": {
        "title": "New Paper: Towards a science of AI agent reliability ",
        "preview_text": "Suppose you hear about a new AI agent for improving productivity — by making purchases, or writing code, or sending emails, or handling a customer on your behalf. Should you trust it? Can the agent do",
        "cover_media_img_url": "https://pbs.twimg.com/media/HB7pbHKWQAA-AXq.jpg"
      }
    },
    "retweeted_tweet": null,
    "isLimitedReply": false,
    "article": null
  },
  "isLimitedReply": false,
  "article": null
}