@dair_ai
NEW research on abstract reasoning. Frontier models like GPT-5 and Grok 4 still can't do what humans find trivially easy: infer transformation rules from a handful of examples. The default approach to solving ARC-AGI (the leading benchmark for abstract reasoning) treats these visual puzzles as pure text. Nested lists like [[0,1,2],[3,4,5]]. But that contradicts how humans actually solve these puzzles. This new research introduces Vision-Language Synergy Reasoning (VLSR), a framework that strategically combines visual and textual modalities for different reasoning stages. Vision and text have complementary strengths. Vision excels at global pattern recognition, providing a 3.0% improvement in rule summarization through holistic 2D perception. Text excels at precise execution, with vision causing a 20.5% performance drop on element-wise manipulation tasks. VLSR decomposes the problem accordingly. Phase 1: visualize example matrices as color-coded grids for rule summarization. Phase 2: switch to text for precise rule application. This is about matching the modality to the task. They also introduce Modality-Switch Self-Correction (MSSC), which breaks the confirmation bias that plagues text-only self-correction. After generating an answer textually, the system verifies it visually. Results across GPT-4o, Gemini-2.5-Pro, o4-mini, and Qwen3-VL: up to 7.25% improvement on Gemini, 4.5% on o4-mini over text-only baselines. Text-only self-correction often degrades performance across rounds. MSSC improves consistently at each iteration. The approach extends to fine-tuning. Vision-language synergy training achieves 13.25% on ARC-AGI with Qwen3-8B, outperforming text-only fine-tuning (9.75%) and closed-source baseline GPT-4o (8.25%) with a much smaller model. Abstract reasoning may require coordinated visual and linguistic processing, not either modality alone. This work shows that matching the modality to the reasoning stage, rather than forcing everything through text, unlocks consistent gains across models. Paper: https://t.co/cQZDUGCmjz Learn to build effective AI agents in our academy: https://t.co/zQXQt0PMbG