🐦 Twitter Post Details

Viewing enriched Twitter post

@jerryjliu0

Document OCR benchmarks are still an open problem Existing document OCR benchmarks are either too narrowly focused on a specific type (e.g. FinTabNet, ChartQA), or on documents that aren’t reflective of real-world tasks (e.g. OmniDocBench, OlmOCR-bench on over academic papers) ParseBench is a step towards solving this problem. * It tries to comprehensively cover real-world document distributions within the enterprise. * It contains comprehensive evaluations across 5 different dimensions (tables, charts, content faithfulness, formatting, grounding). * It tries to use metrics that optimize for agent semantic understanding rather than structural similarity. We released this yesterday, and there’s a TON of content: 1. Whitepaper 2. HF dataset 3. Github repo 4. Blog 5. Video And today, we’re excited to feature https://t.co/FYbk3s6M2w, our home page website for ParseBench 💫 come check it out! Take a look at some of our other materials if you’re interested: Blog: https://t.co/57OHkx0pQW Paper: https://t.co/Ho2oH2xEAM

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2044219590444802521/media_0.mp4",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2044219590444802521/media_0.mp4",
      "type": "video",
      "filename": "media_0.mp4"
    },
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2044219590444802521/media_1.jpg",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2044219590444802521/media_1.jpg",
      "type": "photo",
      "filename": "media_1.jpg"
    }
  ],
  "processed_at": "2026-04-15T04:00:56.121323",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2044219590444802521",
  "url": "https://x.com/jerryjliu0/status/2044219590444802521",
  "twitterUrl": "https://twitter.com/jerryjliu0/status/2044219590444802521",
  "text": "Document OCR benchmarks are still an open problem\n\nExisting document OCR benchmarks are either too narrowly focused on a specific type (e.g. FinTabNet, ChartQA), or on documents that aren’t reflective of real-world tasks (e.g. OmniDocBench, OlmOCR-bench on over academic papers)\n\nParseBench is a step towards solving this problem. \n* It tries to comprehensively cover real-world document distributions within the enterprise.\n* It contains comprehensive evaluations across 5 different dimensions (tables, charts, content faithfulness, formatting, grounding). \n* It tries to use metrics that optimize for agent semantic understanding rather than structural similarity.\n\nWe released this yesterday, and there’s a TON of content:\n1. Whitepaper\n2. HF dataset\n3. Github repo\n4. Blog\n5. Video\n\nAnd today, we’re excited to feature https://t.co/FYbk3s6M2w, our home page website for ParseBench 💫 come check it out!\n\nTake a look at some of our other materials if you’re interested:\nBlog: https://t.co/57OHkx0pQW\nPaper: https://t.co/Ho2oH2xEAM",
  "source": "Twitter for iPhone",
  "retweetCount": 7,
  "replyCount": 5,
  "likeCount": 57,
  "quoteCount": 0,
  "viewCount": 4288,
  "createdAt": "Wed Apr 15 01:01:44 +0000 2026",
  "lang": "en",
  "bookmarkCount": 43,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2044219590444802521",
  "displayTextRange": [
    0,
    278
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "jerryjliu0",
    "url": "https://x.com/jerryjliu0",
    "twitterUrl": "https://twitter.com/jerryjliu0",
    "id": "369777416",
    "name": "Jerry Liu",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1283610285031460864/1Q4zYhtb_normal.jpg",
    "coverPicture": "",
    "description": "",
    "location": "",
    "followers": 72745,
    "following": 1470,
    "status": "",
    "canDm": true,
    "canMediaTag": true,
    "createdAt": "Wed Sep 07 22:54:31 +0000 2011",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 8647,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 1482,
    "statusesCount": 6846,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [],
    "profile_bio": {
      "description": "Parsing the world's hardest PDFs @llama_index. cofounder/CEO\n\nCareers: https://t.co/EUnMNmbCtx\nEnterprise: https://t.co/Ht5jwxSrQB",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [
            {
              "display_url": "llamaindex.ai/careers",
              "expanded_url": "https://www.llamaindex.ai/careers",
              "indices": [
                71,
                94
              ],
              "url": "https://t.co/EUnMNmbCtx"
            },
            {
              "display_url": "llamaindex.ai/contact",
              "expanded_url": "https://www.llamaindex.ai/contact",
              "indices": [
                107,
                130
              ],
              "url": "https://t.co/Ht5jwxSrQB"
            }
          ],
          "user_mentions": [
            {
              "id_str": "0",
              "indices": [
                33,
                45
              ],
              "name": "",
              "screen_name": "llama_index"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "llamaindex.ai",
              "expanded_url": "https://www.llamaindex.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/YiIfjVlzb6"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "additional_media_info": {
          "monetizable": true
        },
        "display_url": "pic.twitter.com/lvSRJFKU9D",
        "expanded_url": "https://twitter.com/jerryjliu0/status/2044219590444802521/video/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "id_str": "2044219494441332737",
        "indices": [
          279,
          302
        ],
        "media_key": "13_2044219494441332737",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARxehsypGrABAAA=",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAAECgABHF6GzKkasAEAAA==",
            "media_key": "13_2044219494441332737"
          }
        },
        "media_url_https": "https://pbs.twimg.com/amplify_video_thumb/2044219494441332737/img/m1rLfLiL-WW6CcF9.jpg",
        "original_info": {
          "focus_rects": [],
          "height": 1080,
          "width": 1724
        },
        "sizes": {
          "large": {
            "h": 1080,
            "w": 1724
          }
        },
        "type": "video",
        "url": "https://t.co/lvSRJFKU9D",
        "video_info": {
          "aspect_ratio": [
            431,
            270
          ],
          "duration_millis": 46770,
          "variants": [
            {
              "content_type": "application/x-mpegURL",
              "url": "https://video.twimg.com/amplify_video/2044219494441332737/pl/EvhN3dGXcktBMmbg.m3u8?tag=21"
            },
            {
              "bitrate": 256000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2044219494441332737/vid/avc1/430x270/jLvB6xRADShZF4v4.mp4?tag=21"
            },
            {
              "bitrate": 832000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2044219494441332737/vid/avc1/574x360/Pak5LtEUipFexcxS.mp4?tag=21"
            },
            {
              "bitrate": 2176000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2044219494441332737/vid/avc1/1148x720/-fKtJadavuR5MsKr.mp4?tag=21"
            },
            {
              "bitrate": 10368000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/2044219494441332737/vid/avc1/1724x1080/0c4AWi03qaa3i-7X.mp4?tag=21"
            }
          ]
        }
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "timestamps": [],
    "urls": [
      {
        "display_url": "parsebench.ai",
        "expanded_url": "http://parsebench.ai",
        "indices": [
          823,
          846
        ],
        "url": "https://t.co/FYbk3s6M2w"
      },
      {
        "display_url": "llamaindex.ai/blog/parsebenc…",
        "expanded_url": "https://www.llamaindex.ai/blog/parsebench?utm_medium=socials&utm_source=xjl&utm_campaign=2026-apr-",
        "indices": [
          978,
          1001
        ],
        "url": "https://t.co/57OHkx0pQW"
      },
      {
        "display_url": "arxiv.org/abs/2604.08538…",
        "expanded_url": "https://arxiv.org/abs/2604.08538?utm_medium=socials&utm_source=twitter&utm_campaign=2026-apr-",
        "indices": [
          1009,
          1032
        ],
        "url": "https://t.co/Ho2oH2xEAM"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "communityInfo": null,
  "article": null
}