@jaseweston
πNew Test-time scaling method π π: https://t.co/yqWvOMZpwq - Use RL to train an LLM solution aggregator β Reasons, reviews, reconciles, and synthesizes a final solution -> Much better than existing techniques! - Simple new method. Strong results across 4 math benchmarks. π§΅1/5 https://t.co/1Y3LaX8DyB