@dair_ai
NEW Research from Meta Superintelligence Labs and collaborators. The default approach to improving LLM reasoning today remains extending chain-of-thought sequences. Longer reasoning traces aren't always better. Longer traces conflate reasoning depth with sequence length and inherit long-context failure modes. This new research introduces Parallel-Distill-Refine (PDR), a framework that treats LLMs as improvement operators rather than single-pass reasoners. Instead of one long reasoning chain, PDR operates in phases: - Generate diverse drafts in parallel. - Distill them into a bounded textual workspace. - Refine conditioned on this workspace. - Repeat. Context length becomes controllable via degree of parallelism, no longer conflated with total tokens generated. The model accumulates wisdom across rounds through compact summaries rather than replaying full histories. On AIME 2024, PDR achieves 93.3% accuracy compared to 79.4% for standard long chain-of-thought at matched latency budgets. For o3-mini at 49k effective tokens, accuracy improves from 76.9% (Long CoT) to 86.7% (PDR), a 9.8 percentage point gain. PDR also achieves the same accuracy as sequential refinement with 2.57x smaller sequential budget by converting parallel compute into accuracy without lengthening per-call context. The researchers also trained an 8B model with operator-consistent RL to make training match the PDR inference interface. Mixing standard and operator RL yields an additional 5% improvement on both AIME benchmarks. Bounded memory iteration can substitute for long reasoning traces while holding latency fixed. Strategic parallelism and distillation is shown to beat brute-force sequence extension. Paper: https://t.co/EviERpmTu7 Learn to build effective AI Agents in our academy: https://t.co/zQXQt0PMbG