🐦 Twitter Post Details

Viewing enriched Twitter post

@METR_Evals

How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks. https://t.co/woREKEWn5S

View on Twitter

📊 Media Metadata

{
  "data": [
    {
      "media_url": "https://pbs.twimg.com/media/GdAthRLagAAhJt3.png",
      "type": "photo"
    }
  ],
  "score": 0.86,
  "scored_at": "2025-08-09T13:46:07.554495",
  "import_source": "network_archive_import",
  "links_checked": true,
  "checked_at": "2025-08-10T10:32:56.885205",
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1860061711849652378/media_0.png?",
      "filename": "media_0.png"
    }
  ],
  "reprocessed_at": "2025-08-12T15:26:45.907512",
  "reprocessed_reason": "missing_media_array"
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2023-09-26T20:39:57.000Z",
    "default_profile_image": false,
    "description": "METR is a research nonprofit that builds evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.",
    "fast_followers_count": 0,
    "favourites_count": 9,
    "followers_count": 2778,
    "friends_count": 1,
    "has_custom_timelines": false,
    "is_translator": false,
    "listed_count": 59,
    "location": "Berkeley, CA",
    "media_count": 32,
    "name": "METR",
    "normal_followers_count": 2778,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1706770561903497216/1724202300",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1731827314101751808/JaakMU5F_normal.jpg",
    "screen_name": "METR_Evals",
    "statuses_count": 83,
    "translator_type": "none",
    "url": "https://t.co/wkntk3G9j6",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "1706770561903497216"
  },
  "id": "1860061711849652378",
  "conversation_id": "1860061711849652378",
  "full_text": "How close are current AI agents to automating AI R&amp;D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks. https://t.co/woREKEWn5S",
  "reply_count": 15,
  "retweet_count": 176,
  "favorite_count": 820,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GdAthRLagAAhJt3.png",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/METR_Evals/status/1860061711849652378",
  "created_at": "2024-11-22T20:44:05.000Z",
  "#sort_index": "1860061711849652378",
  "view_count": 411061,
  "quote_count": 78,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": false,
  "startUrl": "https://x.com/metr_evals/status/1860061711849652378"
}