@dair_ai
Improving RAG with Forward and Backward Lookup This is a clever use of small and large language models. Traditional RAG systems compute similarity between the query and context chunks, retrieve the highest-scoring chunks, and then generate. But complex queries often lack sufficient signal to retrieve what's actually needed. That said, it could be useful if the RAG system could peek into potential future generations. The paper introduces FB-RAG (Forward-Backward RAG), a training-free framework that peeks into potential future generations to improve retrieval. A lightweight LLM generates multiple candidate answers with reasoning, and the chunks most relevant to those attempted answers get scored highest for retrieval. Even when the smaller model fails to answer correctly, its reasoning attempts contain enough relevant language to identify the right context chunks for a more powerful model. The approach works in three stages. Stage I uses an off-the-shelf retriever to narrow the context, optimizing for recall. Stage II runs a lightweight LLM (8B parameters) on that reduced context, samples multiple reasoning+answer outputs, and scores each original chunk by how well it matches any sampled output. Stage III feeds only the highest-scoring chunks to a powerful generator (70B parameters) for the final answer. The results are consistent across 9 datasets from LongBench and ∞Bench. On EN-QA, FB-RAG matches the leading baseline with over 48% latency reduction, or achieves 8% performance improvement with 10% latency reduction. The approach outperforms OP-RAG, Self-Route, and vanilla RAG across QA, summarization, and multiple choice tasks. Forward-looking alone outperforms combining forward and backward components. Setting the backward weight to zero consistently produces better results than averaging, indicating that once you have LLM-generated reasoning, the original query adds no useful signal. Even a 3B model for forward lookup shows visible improvements. The reasoning doesn't need to be correct. It just needs to contain relevant language that points toward the right chunks. Smaller models can systematically improve larger ones without fine-tuning or reinforcement learning. The lightweight forward pass filters context more precisely than query-based retrieval, reducing both noise and latency for the final generation step. Paper: https://t.co/RiRyx0A6tC Learn to build RAG systems and AI Agents in our academy: https://t.co/zQXQt0PMbG