@omarsar0
Test‑time scaling Best‑of‑3 and pass@3 markedly boost AFM, e.g., GAIA 69.9 and HLE 33.2, closing the gap with larger proprietary agent stacks. Overall, Chain-of-Agents enables training single-agent foundation models that natively simulate multi-agent collaboration, combining multi-agent distillation with agentic RL to achieve state-of-the-art results Project + Code + Models: https://t.co/yeI0JTO6ok Paper: https://t.co/vxzeM4x1dy