@arankomatsuzaki
Google presents Inference Scaling for Long-Context Retrieval Augmented Generation - Finds that increasing inference computation leads to nearly linear gains in RAG perf when optimally allocated -Scaling inference compute on long-context LLMs achieves up to 58.9% gains on benchmark https://t.co/dpulK3a20k