🐦 Twitter Post Details

Viewing enriched Twitter post

@charlespacker

Prior to GPT-5, Sonnet & Opus were the undisputed kings of AI coding. It turns out the GPT-5 is significantly better than Sonnet in one key way: the ability to recover from mistakes. Today we're excited to release our latest research at @Letta_AI on Recovery-Bench, a new benchmark for measuring how well model can recover from errors and corrupted states. Coding agents often get confused by past mistakes, and mistakes that accumulate over time can quickly poison the context window. In practice, it can often be better to "nuke" your agent's context window and start fresh once your agent has accumulated enough mistakes in its message history. The inability of current models to course-correct from prior mistakes is a major barrier towards continual learning. Recovery-Bench builds on ideas from Terminal-Bench to create challenging environments where an agent needs to recover from a prior failed trajectory. A surprising finding is that the best performing models overall are clearly not the best performing "recovery" models. Claude Sonnet 4 leads the pack in overall coding ability (on Terminal-Bench), but GPT-5 is a clear #1 on Recovery-Bench. Recovering from failed states is a challenging unsolved task on the road towards self-improving perpetual agents. We're excited to contribute our research and benchmarking code to the open source community to push the frontier of continual learning & open AI.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1960784370899542389/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1960784370899542389/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2025-08-27T21:34:15.206874",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1960784370899542389",
  "url": "https://x.com/charlespacker/status/1960784370899542389",
  "twitterUrl": "https://twitter.com/charlespacker/status/1960784370899542389",
  "text": "Prior to GPT-5, Sonnet & Opus were the undisputed kings of AI coding. It turns out the GPT-5 is significantly better than Sonnet in one key way: the ability to recover from mistakes.\n\nToday we're excited to release our latest research at @Letta_AI on Recovery-Bench, a new benchmark for measuring how well model can recover from errors and corrupted states.\n\nCoding agents often get confused by past mistakes, and mistakes that accumulate over time can quickly poison the context window. In practice, it can often be better to \"nuke\" your agent's context window and start fresh once your agent has accumulated enough mistakes in its message history.\n\nThe inability of current models to course-correct from prior mistakes is a major barrier towards continual learning. Recovery-Bench builds on ideas from Terminal-Bench to create challenging environments where an agent needs to recover from a prior failed trajectory.\n\nA surprising finding is that the best performing models overall are clearly not the best performing \"recovery\" models. Claude Sonnet 4 leads the pack in overall coding ability (on Terminal-Bench), but GPT-5 is a clear #1 on Recovery-Bench.\n\nRecovering from failed states is a challenging unsolved task on the road towards self-improving perpetual agents. We're excited to contribute our research and benchmarking code to the open source community to push the frontier of continual learning & open AI.",
  "source": "Twitter for iPhone",
  "retweetCount": 10,
  "replyCount": 6,
  "likeCount": 35,
  "quoteCount": 1,
  "viewCount": 4784,
  "createdAt": "Wed Aug 27 19:19:58 +0000 2025",
  "lang": "en",
  "bookmarkCount": 8,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1960784370899542389",
  "displayTextRange": [
    0,
    277
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "charlespacker",
    "url": "https://x.com/charlespacker",
    "twitterUrl": "https://twitter.com/charlespacker",
    "id": "2385913832",
    "name": "Charles Packer",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1956206627440484352/V7FCNqxS_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/2385913832/1755230936",
    "description": "",
    "location": "SF",
    "followers": 2903,
    "following": 1001,
    "status": "",
    "canDm": true,
    "canMediaTag": false,
    "createdAt": "Wed Mar 12 22:50:41 +0000 2014",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 3793,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 112,
    "statusesCount": 797,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {
      "label": {
        "badge": {
          "url": "https://pbs.twimg.com/profile_images/1940424059990200321/59lbFsxt_bigger.jpg"
        },
        "description": "Letta",
        "url": {
          "url": "https://twitter.com/Letta_AI",
          "url_type": "DeepLink"
        },
        "user_label_type": "BusinessLabel",
        "user_label_display_type": "Badge"
      }
    },
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1914380650993569817"
    ],
    "profile_bio": {
      "description": "CEO at @Letta_AI // creator of @MemGPT // AI PhD @berkeley_ai @ucbrise @BerkeleySky",
      "entities": {
        "description": {
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                7,
                16
              ],
              "name": "",
              "screen_name": "Letta_AI"
            },
            {
              "id_str": "0",
              "indices": [
                31,
                38
              ],
              "name": "",
              "screen_name": "MemGPT"
            },
            {
              "id_str": "0",
              "indices": [
                49,
                61
              ],
              "name": "",
              "screen_name": "berkeley_ai"
            },
            {
              "id_str": "0",
              "indices": [
                62,
                70
              ],
              "name": "",
              "screen_name": "ucbrise"
            },
            {
              "id_str": "0",
              "indices": [
                71,
                83
              ],
              "name": "",
              "screen_name": "BerkeleySky"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "charlespacker.com",
              "expanded_url": "http://charlespacker.com",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/6BmA594Xpn"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/R0NU4VJVRu",
        "expanded_url": "https://twitter.com/charlespacker/status/1960784370899542389/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "all": {
            "tags": [
              {
                "name": "Kevin Lin 林冠言",
                "screen_name": "nlpkevinl",
                "type": "user",
                "user_id": "904754498211643393"
              },
              {
                "name": "Shangyin Tan",
                "screen_name": "ShangyinT",
                "type": "user",
                "user_id": "1062544973294432257"
              },
              {
                "name": "Letta",
                "screen_name": "Letta_AI",
                "type": "user",
                "user_id": "1821252546469752832"
              }
            ]
          },
          "large": {},
          "orig": {}
        },
        "id_str": "1960783303063334912",
        "indices": [
          278,
          301
        ],
        "media_key": "3_1960783303063334912",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARs2GgaEm3AACgACGzYa/ySa4XUAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABGzYaBoSbcAAKAAIbNhr/JJrhdQAA",
            "media_key": "3_1960783303063334912"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/GzYaBoSbcAA7P6Y.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 552,
              "w": 986,
              "x": 0,
              "y": 34
            },
            {
              "h": 586,
              "w": 586,
              "x": 224,
              "y": 0
            },
            {
              "h": 586,
              "w": 514,
              "x": 260,
              "y": 0
            },
            {
              "h": 586,
              "w": 293,
              "x": 371,
              "y": 0
            },
            {
              "h": 586,
              "w": 986,
              "x": 0,
              "y": 0
            }
          ],
          "height": 586,
          "width": 986
        },
        "sizes": {
          "large": {
            "h": 586,
            "w": 986
          }
        },
        "type": "photo",
        "url": "https://t.co/R0NU4VJVRu"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "user_mentions": [
      {
        "id_str": "1821252546469752832",
        "indices": [
          238,
          247
        ],
        "name": "Letta",
        "screen_name": "Letta_AI"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}