🐦 Twitter Post Details

Viewing enriched Twitter post

@llama_index

PDFs are the bane of every AI agent's existence: here's why parsing them is so much harder than you think 📄 Every developer building document agents eventually hits the same wall: PDFs weren't designed to be machine-readable. They're drawing instructions from 1982, not structured data. 📝 PDF text isn't stored as characters: it's glyph shapes positioned at coordinates with no semantic meaning 📊 Tables don't exist as objects: they're just lines and text that happen to look tabular when rendered 🔄 Reading order is pure guesswork — content streams have zero relationship to visual flow 🤖 Seventy years of OCR evolution led us to combine text extraction with vision models for optimal results We built LlamaParse using this hybrid approach: fast text extraction for standard content, vision models for complex layouts. It's how we're solving document processing at scale. Read the full breakdown of why PDFs are so challenging and how we're tackling it: https://t.co/K8bQmgq7xN

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2029995922529386760/media_0.jpg?",
      "filename": "media_0.jpg"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/2029995922529386760/media_1.png?",
      "filename": "media_1.png"
    }
  ],
  "processed_at": "2026-03-07T14:16:45.403790",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "2029995922529386760",
  "url": "https://x.com/llama_index/status/2029995922529386760",
  "twitterUrl": "https://twitter.com/llama_index/status/2029995922529386760",
  "text": "PDFs are the bane of every AI agent's existence: here's why parsing them is so much harder than you think 📄\n\nEvery developer building document agents eventually hits the same wall: PDFs weren't designed to be machine-readable. They're drawing instructions from 1982, not structured data.\n\n📝 PDF text isn't stored as characters: it's glyph shapes positioned at coordinates with no semantic meaning\n📊 Tables don't exist as objects: they're just lines and text that happen to look tabular when rendered\n🔄 Reading order is pure guesswork — content streams have zero relationship to visual flow\n🤖 Seventy years of OCR evolution led us to combine text extraction with vision models for optimal results\n\nWe built LlamaParse using this hybrid approach: fast text extraction for standard content, vision models for complex layouts. It's how we're solving document processing at scale.\n\nRead the full breakdown of why PDFs are so challenging and how we're tackling it: https://t.co/K8bQmgq7xN",
  "source": "Twitter for iPhone",
  "retweetCount": 12,
  "replyCount": 5,
  "likeCount": 96,
  "quoteCount": 3,
  "viewCount": 71796,
  "createdAt": "Fri Mar 06 19:01:58 +0000 2026",
  "lang": "en",
  "bookmarkCount": 120,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "2029995922529386760",
  "displayTextRange": [
    0,
    270
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "llama_index",
    "url": "https://x.com/llama_index",
    "twitterUrl": "https://twitter.com/llama_index",
    "id": "1604278358296055808",
    "name": "LlamaIndex 🦙",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": "Business",
    "profilePicture": "https://pbs.twimg.com/profile_images/1967920417760251904/0ytfduMQ_normal.png",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1604278358296055808/1770092126",
    "description": "",
    "location": "",
    "followers": 109426,
    "following": 29,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Sun Dec 18 00:52:44 +0000 2022",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 1499,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 1832,
    "statusesCount": 3754,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "2029767312195117278"
    ],
    "profile_bio": {
      "description": "AI Agents for document OCR + workflows\n\nLlamaParse: https://t.co/yQGTiRSfFL\nDocs: https://t.co/us6GCS14vD",
      "entities": {
        "description": {
          "hashtags": [],
          "symbols": [],
          "urls": [
            {
              "display_url": "cloud.llamaindex.ai",
              "expanded_url": "https://cloud.llamaindex.ai/",
              "indices": [
                52,
                75
              ],
              "url": "https://t.co/yQGTiRSfFL"
            },
            {
              "display_url": "developers.llamaindex.ai/python/cloud/",
              "expanded_url": "https://developers.llamaindex.ai/python/cloud/",
              "indices": [
                82,
                105
              ],
              "url": "https://t.co/us6GCS14vD"
            }
          ],
          "user_mentions": []
        },
        "url": {
          "urls": [
            {
              "display_url": "llamaindex.ai",
              "expanded_url": "https://www.llamaindex.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/epzefqPT9Z"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "display_url": "pic.twitter.com/4O4C8hQ7Ml",
        "expanded_url": "https://twitter.com/llama_index/status/2029995922529386760/photo/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {
            "faces": [
              {
                "h": 127,
                "w": 127,
                "x": 1275,
                "y": 501
              }
            ]
          },
          "orig": {
            "faces": [
              {
                "h": 128,
                "w": 128,
                "x": 1279,
                "y": 503
              }
            ]
          }
        },
        "id_str": "2029995918804811776",
        "indices": [
          271,
          294
        ],
        "media_key": "3_2029995918804811776",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARwr/ohj2oAACgACHCv+iUHbAQgAAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABHCv+iGPagAAKAAIcK/6JQdsBCAAA",
            "media_key": "3_2029995918804811776"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/HCv-iGPagAAJ-qL.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 1150,
              "w": 2053,
              "x": 0,
              "y": 0
            },
            {
              "h": 1159,
              "w": 1159,
              "x": 894,
              "y": 0
            },
            {
              "h": 1159,
              "w": 1017,
              "x": 977,
              "y": 0
            },
            {
              "h": 1159,
              "w": 580,
              "x": 1195,
              "y": 0
            },
            {
              "h": 1159,
              "w": 2053,
              "x": 0,
              "y": 0
            }
          ],
          "height": 1159,
          "width": 2053
        },
        "sizes": {
          "large": {
            "h": 1156,
            "w": 2048
          }
        },
        "type": "photo",
        "url": "https://t.co/4O4C8hQ7Ml"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "hashtags": [],
    "symbols": [],
    "urls": [
      {
        "display_url": "llamaindex.ai/blog/why-readi…",
        "expanded_url": "https://www.llamaindex.ai/blog/why-reading-pdfs-is-hard?utm_source=socials&utm_medium=li_social",
        "indices": [
          959,
          982
        ],
        "url": "https://t.co/K8bQmgq7xN"
      }
    ],
    "user_mentions": []
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}