🐦 Twitter Post Details

Viewing enriched Twitter post

@HelloSurgeAI

Everyone’s building $100M "agentic" models, so we built a simulated company to see if they could actually hold down a job. Spoiler: they're all fired. Welcome to EnterpriseBench -- CoreCraft edition. CoreCraft is a high-growth hardware startup (i.e., RL environment) with 23 tools, 2500 entities, and enough corporate red tape to make Harvey cry. The best agent in the world (Opus 4.6! 👑) barely scored 30%. The #2 model (GPT-5.2 🥈) gave up because a search returned 10 results and it couldn't figure out how to change the date filter. Another one (Gemini 3 Flash, #9) literally made up a delivery date just to deny a customer's refund. Savage. (The new Gemini 3.1 Pro? Still lagging behind, at 🥉) The good news? We trained a model on this chaos and it got better at its job - even translating those skills to other benchmarks. (e.g., +7.4% on Tau2-Bench Retail) Check out the full EnterpriseBench: CoreCraft leaderboard below, and read about our RL environment and research! Blog post: https://t.co/GUaXJ8BeP0 Paper: https://t.co/1BMiTUdM66 Leaderboard: https://t.co/UbSx9gmbnX

View on Twitter

📊 Media Metadata

{
  "score": 0.46,
  "score_components": {
    "author": 0.09,
    "engagement": 0.0,
    "quality": 0.16000000000000003,
    "source": 0.135,
    "nlp": 0.05,
    "recency": 0.025
  },
  "scored_at": "2026-03-01T12:10:50.359787",
  "import_source": "api_import",
  "source_tagged_at": "2026-03-01T12:10:50.359806",
  "enriched": true,
  "enriched_at": "2026-03-01T12:10:50.359809"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2024924336931700825",
  "url": "https://x.com/HelloSurgeAI/status/2024924336931700825",
  "twitterUrl": "https://twitter.com/HelloSurgeAI/status/2024924336931700825",
  "text": "Everyone’s building $100M \"agentic\" models, so we built a simulated company to see if they could actually hold down a job.\n\nSpoiler: they're all fired.\n\nWelcome to EnterpriseBench -- CoreCraft edition. CoreCraft is a high-growth hardware startup (i.e., RL environment) with 23 tools, 2500 entities, and enough corporate red tape to make Harvey cry.\n\nThe best agent in the world (Opus 4.6! 👑) barely scored 30%.\n\nThe #2 model (GPT-5.2 🥈) gave up because a search returned 10 results and it couldn't figure out how to change the date filter.\n\nAnother one (Gemini 3 Flash, #9) literally made up a delivery date just to deny a customer's refund. Savage.\n\n(The new Gemini 3.1 Pro? Still lagging behind, at 🥉)\n\nThe good news? We trained a model on this chaos and it got better at its job - even translating those skills to other benchmarks. (e.g., +7.4% on Tau2-Bench Retail)\n\nCheck out the full EnterpriseBench: CoreCraft leaderboard below, and read about our RL environment and research!\n\nBlog post: https://t.co/GUaXJ8BeP0\nPaper: https://t.co/1BMiTUdM66\nLeaderboard: https://t.co/UbSx9gmbnX",
  "source": "Twitter for iPhone",
  "retweetCount": 1,
  "replyCount": 7,
  "likeCount": 33,
  "quoteCount": 0,
  "viewCount": 3790,
  "createdAt": "Fri Feb 20 19:09:17 +0000 2026",
  "lang": "en",
  "bookmarkCount": 28,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2024924336931700825",
  "displayTextRange": [
    0,
    276
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "HelloSurgeAI",
    "url": "https://x.com/HelloSurgeAI",
    "twitterUrl": "https://twitter.com/HelloSurgeAI",
    "id": "1267866160894222343",
    "name": "Surge AI",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1992439362009645056/itZea2R1_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1267866160894222343/1763868703",
    "description": "",
    "location": "",
    "followers": 8040,
    "following": 142,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Tue Jun 02 17:10:41 +0000 2020",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 257,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 192,
    "statusesCount": 664,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1681343766123143168"
    ],
    "profile_bio": {
      "description": "Our mission is to raise AGI with the richness of humanity — curious, witty, imaginative, and full of breathtaking brilliance.",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [],
          "user_mentions": []
        },
        "url": {
          "urls": [
            {
              "display_url": "surgehq.ai",
              "expanded_url": "https://www.surgehq.ai",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/6bGF7OxrIX"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {},
  "card": {
    "binding_values": [
      {
        "key": "photo_image_full_size_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=800x419",
            "width": 800
          }
        }
      },
      {
        "key": "thumbnail_image",
        "value": {
          "image_value": {
            "height": 150,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=280x150",
            "width": 225
          }
        }
      },
      {
        "key": "description",
        "value": {
          "string_value": "Stop testing models in tiny, self-contained environments. We built CoreCraft, a large-scale startup world, and deployed AI agents to solve real tasks. Our goal: to move agents beyond the cleanliness..."
        }
      },
      {
        "key": "domain",
        "value": {
          "string_value": "surgehq.ai"
        }
      },
      {
        "key": "thumbnail_image_large",
        "value": {
          "image_value": {
            "height": 320,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=800x320_1",
            "width": 480
          }
        }
      },
      {
        "key": "summary_photo_image_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=386x202",
            "width": 386
          }
        }
      },
      {
        "key": "thumbnail_image_original",
        "value": {
          "image_value": {
            "height": 1024,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=orig",
            "width": 1536
          }
        }
      },
      {
        "key": "photo_image_full_size_small",
        "value": {
          "image_value": {
            "height": 202,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=386x202",
            "width": 386
          }
        }
      },
      {
        "key": "summary_photo_image_large",
        "value": {
          "image_value": {
            "height": 419,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=800x419",
            "width": 800
          }
        }
      },
      {
        "key": "thumbnail_image_small",
        "value": {
          "image_value": {
            "height": 67,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=100x100",
            "width": 100
          }
        }
      },
      {
        "key": "thumbnail_image_x_large",
        "value": {
          "image_value": {
            "height": 1024,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=png&name=2048x2048_2_exp",
            "width": 1536
          }
        }
      },
      {
        "key": "photo_image_full_size_original",
        "value": {
          "image_value": {
            "height": 1024,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=orig",
            "width": 1536
          }
        }
      },
      {
        "key": "vanity_url",
        "value": {
          "scribe_key": "vanity_url",
          "string_value": "surgehq.ai"
        }
      },
      {
        "key": "photo_image_full_size",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=600x314",
            "width": 600
          }
        }
      },
      {
        "key": "thumbnail_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 33.18,
                "rgb": {
                  "blue": 50,
                  "green": 33,
                  "red": 42
                }
              },
              {
                "percentage": 12.77,
                "rgb": {
                  "blue": 67,
                  "green": 70,
                  "red": 126
                }
              },
              {
                "percentage": 7.4,
                "rgb": {
                  "blue": 96,
                  "green": 149,
                  "red": 231
                }
              },
              {
                "percentage": 6.05,
                "rgb": {
                  "blue": 91,
                  "green": 44,
                  "red": 44
                }
              },
              {
                "percentage": 3.78,
                "rgb": {
                  "blue": 158,
                  "green": 218,
                  "red": 250
                }
              }
            ]
          }
        }
      },
      {
        "key": "title",
        "value": {
          "string_value": "EnterpriseBench: CoreCraft – Measuring AI Agents in Chaotic, Enterprise RL Environments"
        }
      },
      {
        "key": "summary_photo_image_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 33.18,
                "rgb": {
                  "blue": 50,
                  "green": 33,
                  "red": 42
                }
              },
              {
                "percentage": 12.77,
                "rgb": {
                  "blue": 67,
                  "green": 70,
                  "red": 126
                }
              },
              {
                "percentage": 7.4,
                "rgb": {
                  "blue": 96,
                  "green": 149,
                  "red": 231
                }
              },
              {
                "percentage": 6.05,
                "rgb": {
                  "blue": 91,
                  "green": 44,
                  "red": 44
                }
              },
              {
                "percentage": 3.78,
                "rgb": {
                  "blue": 158,
                  "green": 218,
                  "red": 250
                }
              }
            ]
          }
        }
      },
      {
        "key": "summary_photo_image_x_large",
        "value": {
          "image_value": {
            "height": 1024,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=png&name=2048x2048_2_exp",
            "width": 1536
          }
        }
      },
      {
        "key": "summary_photo_image",
        "value": {
          "image_value": {
            "height": 314,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=600x314",
            "width": 600
          }
        }
      },
      {
        "key": "photo_image_full_size_color",
        "value": {
          "image_color_value": {
            "palette": [
              {
                "percentage": 33.18,
                "rgb": {
                  "blue": 50,
                  "green": 33,
                  "red": 42
                }
              },
              {
                "percentage": 12.77,
                "rgb": {
                  "blue": 67,
                  "green": 70,
                  "red": 126
                }
              },
              {
                "percentage": 7.4,
                "rgb": {
                  "blue": 96,
                  "green": 149,
                  "red": 231
                }
              },
              {
                "percentage": 6.05,
                "rgb": {
                  "blue": 91,
                  "green": 44,
                  "red": 44
                }
              },
              {
                "percentage": 3.78,
                "rgb": {
                  "blue": 158,
                  "green": 218,
                  "red": 250
                }
              }
            ]
          }
        }
      },
      {
        "key": "photo_image_full_size_x_large",
        "value": {
          "image_value": {
            "height": 1024,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=png&name=2048x2048_2_exp",
            "width": 1536
          }
        }
      },
      {
        "key": "card_url",
        "value": {
          "scribe_key": "card_url",
          "string_value": "https://t.co/GUaXJ8BeP0"
        }
      },
      {
        "key": "summary_photo_image_original",
        "value": {
          "image_value": {
            "height": 1024,
            "url": "https://pbs.twimg.com/card_img/2027134585776332800/hD1H6yt9?format=jpg&name=orig",
            "width": 1536
          }
        }
      }
    ],
    "card_platform": {
      "platform": {
        "audience": {
          "name": "production"
        },
        "device": {
          "name": "iPhone",
          "version": "13"
        }
      }
    },
    "name": "summary_large_image",
    "url": "https://t.co/GUaXJ8BeP0",
    "user_refs_results": []
  },
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "surgehq.ai/blog/enterpris…",
        "expanded_url": "https://surgehq.ai/blog/enterprisebench-corecraft",
        "indices": [
          996,
          1019
        ],
        "url": "https://t.co/GUaXJ8BeP0"
      },
      {
        "display_url": "cdn.prod.website-files.com/68dc970bd6e945…",
        "expanded_url": "https://cdn.prod.website-files.com/68dc970bd6e945ea3fb0f426/69977eb3a4f3f7a9262d0809_EnterpriseBench_Corecraft.pdf",
        "indices": [
          1027,
          1050
        ],
        "url": "https://t.co/1BMiTUdM66"
      },
      {
        "display_url": "surgehq.ai/leaderboards/e…",
        "expanded_url": "https://surgehq.ai/leaderboards/enterprisebench-corecraft",
        "indices": [
          1064,
          1087
        ],
        "url": "https://t.co/UbSx9gmbnX"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}