@jerryjliu0
Document understanding is a huge use case for VLMs, but historically there's been no single "good" benchmark to measure progress here (unlike SWE-bench for coding). This past week I did a deep dive into OlmOCR-Bench, a recent document OCR benchmark that is a huge step in the right direction. ✅ It covers 1400+ PDFs containing formulas, tables, tiny text, and more ✅ It uses binary, verifiable unit tests that are super cheap to run. That said there's still some room to go: 🟡 There's a lot of types of data that still needs to be covered - complex tables, chart understanding, form rendering, handwriting, foreign language, and more 🟡 The binary unit tests are still quite coarse + sometimes use brittle exact matching. Check out my blog: https://t.co/1tXTcoTIx2 FWIW we do quite well over this and recently upgraded our default modes too: https://t.co/XYZmx5TFz8