@jmin__cho
Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)! ⚡️Key Highlights: ➡️ M3DocRAG flexibly accommodates various settings: - closed & open-domain document contexts (from a single-page doc to a corpus of many long docs) - single & multi-hop questions - diverse elements (text, table, image, etc.) ➡️ M3DocVQA is a new open-domain DocVQA benchmark where models should answer multi-hop questions (across multiple pages and documents) 3K+ PDFs (w/ 40K+ pages) ➡️ Strong results on 3 benchmarks (M3DocVQA/MMLongBench-Doc/MP-DocVQA), including SoTA results on MP-DocVQA 🧵👇