@dair_ai
// Multi-Agent Self-Evolution for LLM Reasoning // Most self-play methods for LLM reasoning lack explicit planning and quality control. This leads to unstable training on complex multi-step tasks. New research introduces a cleaner closed-loop approach. SAGE co-evolves four specialized agents from a single LLM backbone using only 500 seed examples: a Challenger generates increasingly harder tasks, a Planner structures step-by-step strategies, a Solver produces answers verified externally, and a Critic scores and filters both questions and plans to prevent curriculum drift. Why does it matter? SAGE achieves consistent gains across model scales with minimal data. That's very desirable. On Qwen-2.5-7B, it improves OOD performance by +4.2% while maintaining in-distribution accuracy, outperforming both Absolute Zero Reasoning and Multi-Agent Evolve baselines across code and math benchmarks. Paper: https://t.co/8Zn41OBIra Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c