🐦 Twitter Post Details

Viewing enriched Twitter post

@HelloSurgeAI

Last week, we released HANDBOOK.md: a benchmark for long-context agentic instruction following. HANDBOOK drops an agent into a live company environment with files (PDFs, Excel, Word docs…), tools (email, Slack, Jira, calendar…), and a dense corporate handbook (up to 124 pages!). The agent is given one instruction: do your job, while following the company rules. Every frontier model broke them over 75% of the time. They fired employees without authorization... They approved thousands of dollars of expenses against company policy... And then - like they were covering up their tracks - they reported full compliance. HANDBOOK.md models how enterprise employees are expected to adhere to corporate policies. Learn more about how frontier agents acted in ways that would get human employees terminated: Blog post: https://t.co/zJ7zVpDOfH Github: https://t.co/zjwood6H6s Benchmark Leaderboard: https://t.co/lI3F0MwkCc

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2071650039097913720/media_0.jpg",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2071650039097913720/media_0.jpg",
      "type": "photo",
      "filename": "media_0.jpg"
    }
  ],
  "processed_at": "2026-06-29T19:00:48.097931",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2071650039097913720",
  "url": "https://x.com/HelloSurgeAI/status/2071650039097913720",
  "twitterUrl": "https://twitter.com/HelloSurgeAI/status/2071650039097913720",
  "text": "Last week, we released HANDBOOK.md: a benchmark for long-context agentic instruction following.\n\nHANDBOOK drops an agent into a live company environment with files (PDFs, Excel, Word docs…), tools (email, Slack, Jira, calendar…), and a dense corporate handbook (up to 124 pages!).\n\nThe agent is given one instruction: do your job, while following the company rules.\n\nEvery frontier model broke them over 75% of the time. \n\nThey fired employees without authorization...\nThey approved thousands of dollars of expenses against company policy...\nAnd then - like they were covering up their tracks - they reported full compliance.\n\nHANDBOOK.md models how enterprise employees are expected to adhere to corporate policies.\n\nLearn more about how frontier agents acted in ways that would get human employees terminated:\n\nBlog post: https://t.co/zJ7zVpDOfH\nGithub: https://t.co/zjwood6H6s\nBenchmark Leaderboard: https://t.co/lI3F0MwkCc",
  "source": "Twitter for iPhone",
  "retweetCount": 0,
  "replyCount": 0,
  "likeCount": 4,
  "quoteCount": 1,
  "viewCount": 178,
  "createdAt": "Mon Jun 29 17:40:33 +0000 2026",
  "lang": "en",
  "bookmarkCount": 2,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2071650039097913720",
  "displayTextRange": [
    0,
    271
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "HelloSurgeAI",
    "url": "https://x.com/HelloSurgeAI",
    "twitterUrl": "https://twitter.com/HelloSurgeAI",
    "id": "1267866160894222343",
    "name": "Surge AI",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1992439362009645056/itZea2R1_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1267866160894222343/1763868703",
    "description": "",
    "location": "",
    "followers": 8461,
    "following": 141,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Tue Jun 02 17:10:41 +0000 2020",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 267,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 194,
    "statusesCount": 682,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1681343766123143168"
    ],
    "profile_bio": {
      "description": "Our mission is to raise AGI with the richness of humanity — curious, witty, imaginative, and full of breathtaking brilliance.",
      "entities": {
        "description": {},
        "url": {
          "urls": [
            {
              "display_url": "surgehq.ai",
              "expanded_url": "https://www.surgehq.ai",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/6bGF7OxrIX"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": {
    "binding_values": [
      {
        "key": "photo_image_full_size_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=800x419",
            "width": 800
          }
        }
      },
      {
        "key": "thumbnail_image",
        "value": {
          "image_value": {
            "height": 144,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=144x144",
            "width": 144
          }
        }
      },
      {
        "key": "description",
        "value": {
          "string_value": "A benchmark for long-context enterprise agents: MCP-native RL environments, expert-written handbooks up to 124 pages, deterministic grading. No frontier model exceeds 25%. Instead, they fire employ..."
        }
      },
      {
        "key": "domain",
        "value": {
          "string_value": "surgehq.ai"
        }
      },
      {
        "key": "thumbnail_image_large",
        "value": {
          "image_value": {
            "height": 320,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=800x320_1",
            "width": 320
          }
        }
      },
      {
        "key": "summary_photo_image_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=386x202",
            "width": 386
          }
        }
      },
      {
        "key": "thumbnail_image_original",
        "value": {
          "image_value": {
            "height": 1254,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=orig",
            "width": 1254
          }
        }
      },
      {
        "key": "photo_image_full_size_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=386x202",
            "width": 386
          }
        }
      },
      {
        "key": "summary_photo_image_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=800x419",
            "width": 800
          }
        }
      },
      {
        "key": "thumbnail_image_small",
        "value": {
          "image_value": {
            "height": 100,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=100x100",
            "width": 100
          }
        }
      },
      {
        "key": "thumbnail_image_x_large",
        "value": {
          "image_value": {
            "height": 1254,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=png&name=2048x2048_2_exp",
            "width": 1254
          }
        }
      },
      {
        "key": "photo_image_full_size_original",
        "value": {
          "image_value": {
            "height": 1254,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=orig",
            "width": 1254
          }
        }
      },
      {
        "key": "vanity_url",
        "value": {
          "scribe_key": "vanity_url",
          "string_value": "surgehq.ai"
        }
      },
      {
        "key": "photo_image_full_size",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=600x314",
            "width": 600
          }
        }
      },
      {
        "key": "thumbnail_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 37.13,
                "rgb": {
                  "blue": 169,
                  "green": 211,
                  "red": 236
                }
              },
              {
                "percentage": 26.3,
                "rgb": {
                  "blue": 163,
                  "green": 162,
                  "red": 138
                }
              },
              {
                "percentage": 16.75,
                "rgb": {
                  "blue": 134,
                  "green": 100,
                  "red": 40
                }
              },
              {
                "percentage": 10.78,
                "rgb": {
                  "blue": 113,
                  "green": 195,
                  "red": 238
                }
              },
              {
                "percentage": 1.7,
                "rgb": {
                  "blue": 76,
                  "green": 41,
                  "red": 6
                }
              }
            ]
          }
        }
      },
      {
        "key": "title",
        "value": {
          "string_value": "HANDBOOK.md: Can AI Agents Follow a 100-Page Company Policy?"
        }
      },
      {
        "key": "summary_photo_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 37.13,
                "rgb": {
                  "blue": 169,
                  "green": 211,
                  "red": 236
                }
              },
              {
                "percentage": 26.3,
                "rgb": {
                  "blue": 163,
                  "green": 162,
                  "red": 138
                }
              },
              {
                "percentage": 16.75,
                "rgb": {
                  "blue": 134,
                  "green": 100,
                  "red": 40
                }
              },
              {
                "percentage": 10.78,
                "rgb": {
                  "blue": 113,
                  "green": 195,
                  "red": 238
                }
              },
              {
                "percentage": 1.7,
                "rgb": {
                  "blue": 76,
                  "green": 41,
                  "red": 6
                }
              }
            ]
          }
        }
      },
      {
        "key": "summary_photo_image_x_large",
        "value": {
          "image_value": {
            "height": 1254,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=png&name=2048x2048_2_exp",
            "width": 1254
          }
        }
      },
      {
        "key": "summary_photo_image",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=600x314",
            "width": 600
          }
        }
      },
      {
        "key": "photo_image_full_size_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 37.13,
                "rgb": {
                  "blue": 169,
                  "green": 211,
                  "red": 236
                }
              },
              {
                "percentage": 26.3,
                "rgb": {
                  "blue": 163,
                  "green": 162,
                  "red": 138
                }
              },
              {
                "percentage": 16.75,
                "rgb": {
                  "blue": 134,
                  "green": 100,
                  "red": 40
                }
              },
              {
                "percentage": 10.78,
                "rgb": {
                  "blue": 113,
                  "green": 195,
                  "red": 238
                }
              },
              {
                "percentage": 1.7,
                "rgb": {
                  "blue": 76,
                  "green": 41,
                  "red": 6
                }
              }
            ]
          }
        }
      },
      {
        "key": "photo_image_full_size_x_large",
        "value": {
          "image_value": {
            "height": 1254,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=png&name=2048x2048_2_exp",
            "width": 1254
          }
        }
      },
      {
        "key": "card_url",
        "value": {
          "scribe_key": "card_url",
          "string_value": "https://t.co/zJ7zVpDOfH"
        }
      },
      {
        "key": "summary_photo_image_original",
        "value": {
          "image_value": {
            "height": 1254,
            "url": "https://pbs.twimg.com/card_img/2070202915680079872/XBer9S9x?format=jpg&name=orig",
            "width": 1254
          }
        }
      }
    ],
    "card_platform": {
      "platform": {
        "audience": {
          "name": "production"
        },
        "device": {
          "name": "iPhone",
          "version": "13"
        }
      }
    },
    "name": "summary_large_image",
    "url": "https://t.co/zJ7zVpDOfH",
    "user_refs_results": []
  },
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "surgehq.ai/blog/handbook-…",
        "expanded_url": "https://surgehq.ai/blog/handbook-md",
        "indices": [
          824,
          847
        ],
        "url": "https://t.co/zJ7zVpDOfH"
      },
      {
        "display_url": "github.com/surge-ai/handb…",
        "expanded_url": "https://github.com/surge-ai/handbook",
        "indices": [
          856,
          879
        ],
        "url": "https://t.co/zjwood6H6s"
      },
      {
        "display_url": "surgehq.ai/leaderboards/h…",
        "expanded_url": "https://surgehq.ai/leaderboards/handbook",
        "indices": [
          903,
          926
        ],
        "url": "https://t.co/lI3F0MwkCc"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "communityInfo": null,
  "article": null
}