@omarsar0
Can an AI agent run a startup for a year without going bankrupt? Turns out most can't. New benchmark from Collinear AI puts 12 models to the test. YC-Bench tasks agents with running a simulated startup over hundreds of turns: hiring employees, selecting contracts, and maintaining profitability in a partially observable environment with adversarial clients and compounding consequences. Only three models consistently surpass the $200K starting capital. Claude Opus 4.6 leads at $1.27M average final funds, followed by GLM-5 at $1.21M with 11x lower inference cost. Scratchpad usage, the sole mechanism for persisting information across context truncation, is the strongest predictor of success. Adversarial client detection accounts for 47% of bankruptcies. Long-horizon coherence, not raw intelligence, separates the winners from the bankrupt. Paper: https://t.co/jVJLJReUsN Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX