@dair_ai
RT @omarsar0: NEW research from Sakana AI. Long contexts get expensive as every token in the input contributes to quadratic attention costs, higher latency, and more memory. This new research introduces Doc-to-LoRA, a lightweight hypernetwork that meta-learns to compress long documents into LoRA adapters in a SINGLE forward pass. In other words, it can instantly internalize contexts. Instead of re-reading the full context at every inference call, the model internalizes the document into compact adapter weights. No iterative fine-tuning is needed, and no repeated context consumption. Cool to see all the interesting new approaches to deal with long contexts like RLM, LCM, and now Doc-to-LoRA. The results: Near-perfect accuracy on needle-in-a-haystack tasks at sequence lengths exceeding the target model's native context window by over 4x. It also outperforms standard context distillation while significantly reducing peak memory consumption and update latency on real-world QA datasets. Why it matters: As agents and LLM applications deal with increasingly long documents, turning context into compact adapters on the fly could drastically reduce serving costs and enable rapid knowledge updates. Paper: https://t.co/Fh1IeLrSpm Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX