🐦 Twitter Post Details

Viewing enriched Twitter post

@jerryjliu0

Document understanding is a huge use case for VLMs, but historically there's been no single "good" benchmark to measure progress here (unlike SWE-bench for coding). This past week I did a deep dive into OlmOCR-Bench, a recent document OCR benchmark that is a huge step in the right direction. ✅ It covers 1400+ PDFs containing formulas, tables, tiny text, and more ✅ It uses binary, verifiable unit tests that are super cheap to run. That said there's still some room to go: 🟡 There's a lot of types of data that still needs to be covered - complex tables, chart understanding, form rendering, handwriting, foreign language, and more 🟡 The binary unit tests are still quite coarse + sometimes use brittle exact matching. Check out my blog: https://t.co/1tXTcoTIx2 FWIW we do quite well over this and recently upgraded our default modes too: https://t.co/XYZmx5TFz8

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1996668513562644823/media_0.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1996668513562644823/media_0.jpg?",
      "type": "photo",
      "filename": "media_0.jpg"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1996668513562644823/media_1.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1996668513562644823/media_1.jpg?",
      "type": "photo",
      "filename": "media_1.jpg"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1996668513562644823/media_2.jpg?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1996668513562644823/media_2.jpg?",
      "type": "photo",
      "filename": "media_2.jpg"
    }
  ],
  "processed_at": "2025-12-08T13:22:27.732861",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1996668513562644823",
  "url": "https://x.com/jerryjliu0/status/1996668513562644823",
  "twitterUrl": "https://twitter.com/jerryjliu0/status/1996668513562644823",
  "text": "Document understanding is a huge use case for VLMs, but historically there's been no single \"good\" benchmark to measure progress here (unlike SWE-bench for coding).\n\nThis past week I did a deep dive into OlmOCR-Bench, a recent document OCR benchmark that is a huge step in the right direction. \n✅ It covers 1400+ PDFs containing formulas, tables, tiny text, and more \n✅ It uses binary, verifiable unit tests that are super cheap to run.\n\nThat said there's still some room to go: \n🟡 There's a lot of types of data that still needs to be covered - complex tables, chart understanding, form rendering, handwriting, foreign language, and more \n🟡 The binary unit tests are still quite coarse + sometimes use brittle exact matching.\n\nCheck out my blog: https://t.co/1tXTcoTIx2\n\nFWIW we do quite well over this and recently upgraded our default modes too: https://t.co/XYZmx5TFz8",
  "source": "Twitter for iPhone",
  "retweetCount": 20,
  "replyCount": 7,
  "likeCount": 169,
  "quoteCount": 1,
  "viewCount": 17988,
  "createdAt": "Thu Dec 04 19:50:44 +0000 2025",
  "lang": "en",
  "bookmarkCount": 132,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1996668513562644823",
  "displayTextRange": [
    0,
    277
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "jerryjliu0",
    "url": "https://x.com/jerryjliu0",
    "twitterUrl": "https://twitter.com/jerryjliu0",
    "id": "369777416",
    "name": "Jerry Liu",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1283610285031460864/1Q4zYhtb_normal.jpg",
    "coverPicture": "",
    "description": "document OCR + workflows @llama_index. cofounder/CEO\n\nCareers: https://t.co/EUnMNmb4DZ\nEnterprise: https://t.co/Ht5jwxRU13",
    "location": "",
    "followers": 66992,
    "following": 1390,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Wed Sep 07 22:54:31 +0000 2011",
    "entities": {
      "description": {
        "urls": [
          {
            "display_url": "llamaindex.ai/careers",
            "expanded_url": "https://www.llamaindex.ai/careers",
            "url": "https://t.co/EUnMNmb4DZ",
            "indices": [
              63,
              86
            ]
          },
          {
            "display_url": "llamaindex.ai/contact",
            "expanded_url": "https://www.llamaindex.ai/contact",
            "url": "https://t.co/Ht5jwxRU13",
            "indices": [
              99,
              122
            ]
          }
        ]
      },
      "url": {
        "urls": [
          {
            "display_url": "llamaindex.ai",
            "expanded_url": "https://www.llamaindex.ai/",
            "url": "https://t.co/YiIfjVl1ly",
            "indices": [
              0,
              23
            ]
          }
        ]
      }
    },
    "fastFollowersCount": 0,
    "favouritesCount": 7951,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 1336,
    "statusesCount": 6311,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1997048645817192638"
    ],
    "profile_bio": {},
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.x.com/YjhwOCBKbT",
        "expanded_url": "https://x.com/jerryjliu0/status/1996668513562644823/photo/1",
        "id_str": "1996668412890959873",
        "indices": [
          278,
          301
        ],
        "media_key": "3_1996668412890959873",
        "media_url_https": "https://pbs.twimg.com/media/G7WXV7yb0AEc37n.jpg",
        "type": "photo",
        "url": "https://t.co/YjhwOCBKbT",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": []
          },
          "medium": {
            "faces": []
          },
          "small": {
            "faces": []
          },
          "orig": {
            "faces": []
          }
        },
        "sizes": {
          "large": {
            "h": 676,
            "w": 1200,
            "resize": "fit"
          },
          "medium": {
            "h": 676,
            "w": 1200,
            "resize": "fit"
          },
          "small": {
            "h": 383,
            "w": 680,
            "resize": "fit"
          },
          "thumb": {
            "h": 150,
            "w": 150,
            "resize": "crop"
          }
        },
        "original_info": {
          "height": 676,
          "width": 1200,
          "focus_rects": [
            {
              "x": 0,
              "y": 0,
              "w": 1200,
              "h": 672
            },
            {
              "x": 0,
              "y": 0,
              "w": 676,
              "h": 676
            },
            {
              "x": 0,
              "y": 0,
              "w": 593,
              "h": 676
            },
            {
              "x": 41,
              "y": 0,
              "w": 338,
              "h": 676
            },
            {
              "x": 0,
              "y": 0,
              "w": 1200,
              "h": 676
            }
          ]
        },
        "media_results": {
          "result": {
            "media_key": "3_1996668412890959873"
          }
        }
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "llamaindex.ai/blog/olmocr-be…",
        "expanded_url": "https://www.llamaindex.ai/blog/olmocr-bench-review-insights-and-pitfalls-on-an-ocr-benchmark",
        "url": "https://t.co/1tXTcoTIx2",
        "indices": [
          747,
          770
        ]
      },
      {
        "display_url": "cloud.llamaindex.ai",
        "expanded_url": "https://cloud.llamaindex.ai/",
        "url": "https://t.co/XYZmx5TFz8",
        "indices": [
          849,
          872
        ]
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": {
    "type": "tweet",
    "id": "1996637860032925898",
    "url": "https://x.com/llama_index/status/1996637860032925898",
    "twitterUrl": "https://twitter.com/llama_index/status/1996637860032925898",
    "text": "OCR benchmarks matter, so in this blog @jerryjliu0 analyzes OlmOCR-Bench, one of the most influential document OCR benchmarks. TLDR: it’s an important step in the right direction, but doesn’t quite cover real-world document parsing needs.\n\n📊 OlmOCR-Bench covers 1400+ PDFs with binary pass-fail tests, but focuses heavily on academic papers (56%) while missing invoices, forms, and financial statements\n🔍 The benchmark's unit tests are too coarse for complex tables and reading order, missing merged cells, chart understanding, and global document structure\n⚡ Exact string matching in tests creates brittleness where small formatting differences cause failures, even when the extraction is semantically correct\n🏗️ Model bias exists since the benchmark uses Sonnet and Gemini to generate test cases, giving advantages to models trained on similar outputs\n\nOur preliminary tests show that LlamaParse shines at deep visual reasoning over figures, diagrams, and complex business documents.\n\nRead our Jerry's analysis of OCR benchmarking challenges and what next-generation document parsing evaluation should look like: https://t.co/UI35k5M2Kd",
    "source": "Twitter for iPhone",
    "retweetCount": 8,
    "replyCount": 3,
    "likeCount": 52,
    "quoteCount": 1,
    "viewCount": 23355,
    "createdAt": "Thu Dec 04 17:48:56 +0000 2025",
    "lang": "en",
    "bookmarkCount": 29,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "1996637860032925898",
    "displayTextRange": [
      0,
      278
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "llama_index",
      "url": "https://x.com/llama_index",
      "twitterUrl": "https://twitter.com/llama_index",
      "id": "1604278358296055808",
      "name": "LlamaIndex 🦙",
      "isVerified": false,
      "isBlueVerified": true,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/1967920417760251904/0ytfduMQ_normal.png",
      "coverPicture": "https://pbs.twimg.com/profile_banners/1604278358296055808/1758023766",
      "description": "AI Agents for document OCR + workflows\n\nGithub: https://t.co/HC19j7veGE\nDocs: https://t.co/QInqg2yMCJ\nLlamaCloud: https://t.co/yQGTiRSfFL",
      "location": "",
      "followers": 104778,
      "following": 28,
      "status": "",
      "canDm": false,
      "canMediaTag": true,
      "createdAt": "Sun Dec 18 00:52:44 +0000 2022",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "github.com/run-llama/llam…",
              "expanded_url": "http://github.com/run-llama/llama_index",
              "url": "https://t.co/HC19j7veGE",
              "indices": [
                48,
                71
              ]
            },
            {
              "display_url": "docs.llamaindex.ai",
              "expanded_url": "http://docs.llamaindex.ai",
              "url": "https://t.co/QInqg2yMCJ",
              "indices": [
                78,
                101
              ]
            },
            {
              "display_url": "cloud.llamaindex.ai",
              "expanded_url": "https://cloud.llamaindex.ai/",
              "url": "https://t.co/yQGTiRSfFL",
              "indices": [
                114,
                137
              ]
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "llamaindex.ai",
              "expanded_url": "https://www.llamaindex.ai/",
              "url": "https://t.co/epzefqPT9Z",
              "indices": [
                0,
                23
              ]
            }
          ]
        }
      },
      "fastFollowersCount": 0,
      "favouritesCount": 1458,
      "hasCustomTimelines": false,
      "isTranslator": false,
      "mediaCount": 1782,
      "statusesCount": 3645,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [],
      "profile_bio": {},
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {
      "media": [
        {
          "display_url": "pic.x.com/KU3EcZ5LG8",
          "expanded_url": "https://x.com/llama_index/status/1996637860032925898/photo/1",
          "id_str": "1996637855473717250",
          "indices": [
            279,
            302
          ],
          "media_key": "3_1996637855473717250",
          "media_url_https": "https://pbs.twimg.com/media/G7V7jQib0AIdMf1.jpg",
          "type": "photo",
          "url": "https://t.co/KU3EcZ5LG8",
          "ext_media_availability": {
            "status": "Available"
          },
          "features": {
            "large": {
              "faces": [
                {
                  "x": 1631,
                  "y": 763,
                  "h": 185,
                  "w": 185
                }
              ]
            },
            "medium": {
              "faces": [
                {
                  "x": 956,
                  "y": 447,
                  "h": 108,
                  "w": 108
                }
              ]
            },
            "small": {
              "faces": [
                {
                  "x": 541,
                  "y": 253,
                  "h": 61,
                  "w": 61
                }
              ]
            },
            "orig": {
              "faces": [
                {
                  "x": 2400,
                  "y": 1123,
                  "h": 273,
                  "w": 273
                }
              ]
            }
          },
          "sizes": {
            "large": {
              "h": 1153,
              "w": 2048,
              "resize": "fit"
            },
            "medium": {
              "h": 676,
              "w": 1200,
              "resize": "fit"
            },
            "small": {
              "h": 383,
              "w": 680,
              "resize": "fit"
            },
            "thumb": {
              "h": 150,
              "w": 150,
              "resize": "crop"
            }
          },
          "original_info": {
            "height": 1696,
            "width": 3012,
            "focus_rects": [
              {
                "x": 0,
                "y": 0,
                "w": 3012,
                "h": 1687
              },
              {
                "x": 0,
                "y": 0,
                "w": 1696,
                "h": 1696
              },
              {
                "x": 0,
                "y": 0,
                "w": 1488,
                "h": 1696
              },
              {
                "x": 103,
                "y": 0,
                "w": 848,
                "h": 1696
              },
              {
                "x": 0,
                "y": 0,
                "w": 3012,
                "h": 1696
              }
            ]
          },
          "media_results": {
            "result": {
              "media_key": "3_1996637855473717250"
            }
          }
        }
      ]
    },
    "card": null,
    "place": {},
    "entities": {
      "hashtags": [],
      "symbols": [],
      "urls": [
        {
          "display_url": "llamaindex.ai/blog/olmocr-be…",
          "expanded_url": "https://www.llamaindex.ai/blog/olmocr-bench-review-insights-and-pitfalls-on-an-ocr-benchmark?utm_source=socials&utm_medium=li_social",
          "url": "https://t.co/UI35k5M2Kd",
          "indices": [
            1115,
            1138
          ]
        }
      ],
      "user_mentions": [
        {
          "id_str": "369777416",
          "name": "Jerry Liu",
          "screen_name": "jerryjliu0",
          "indices": [
            39,
            50
          ]
        }
      ]
    },
    "quoted_tweet": null,
    "retweeted_tweet": null,
    "article": null
  },
  "retweeted_tweet": null,
  "article": null
}