🐦 Twitter Post Details

Viewing enriched Twitter post

@llama_index

Stop losing 80% of your data when extracting from long documents with repeating entities like catalogs, tables, and lists. Our new Table Row extraction target in LlamaExtract solves the core problem: instead of trying to extract everything at once (where LLMs get overwhelmed), we intelligently segment documents and extract entity by entity. 🎯 Long multi-page insurance directory? Extract all 380 hospitals (vs. only 40 with document-level extraction) 📋 Handle both formal tables and semi-structured content like product catalogs automatically 🔍 Define your schema for a single entity - we return the complete list with exhaustive coverage ⚡ Leverage LLM flexibility while achieving template-based reliability through smart segmentation This approach identifies repeating patterns, segments documents at natural boundaries, then applies your schema to focused chunks. Works with tables, lists, catalogs, or any document with distinguishable repeating entities. Read the full technical breakdown with code examples here: https://t.co/Zrghh3ZZ7N

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "type": "video",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1993730035266826302/media_0.mp4?",
      "filename": "media_0.mp4"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1993730035266826302/media_1.jpg?",
      "filename": "media_1.jpg"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1993730035266826302/media_2.png?",
      "filename": "media_2.png"
    }
  ],
  "processed_at": "2025-11-27T20:21:23.646480",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1993730035266826302",
  "url": "https://x.com/llama_index/status/1993730035266826302",
  "twitterUrl": "https://twitter.com/llama_index/status/1993730035266826302",
  "text": "Stop losing 80% of your data when extracting from long documents with repeating entities like catalogs, tables, and lists.\n\nOur new Table Row extraction target in LlamaExtract solves the core problem: instead of trying to extract everything at once (where LLMs get overwhelmed), we intelligently segment documents and extract entity by entity.\n\n🎯 Long multi-page insurance directory? Extract all 380 hospitals (vs. only 40 with document-level extraction)\n📋 Handle both formal tables and semi-structured content like product catalogs automatically\n🔍 Define your schema for a single entity - we return the complete list with exhaustive coverage\n⚡ Leverage LLM flexibility while achieving template-based reliability through smart segmentation\n\nThis approach identifies repeating patterns, segments documents at natural boundaries, then applies your schema to focused chunks. Works with tables, lists, catalogs, or any document with distinguishable repeating entities.\n\nRead the full technical breakdown with code examples here: https://t.co/Zrghh3ZZ7N",
  "source": "Twitter for iPhone",
  "retweetCount": 4,
  "replyCount": 1,
  "likeCount": 14,
  "quoteCount": 0,
  "viewCount": 1827,
  "createdAt": "Wed Nov 26 17:14:16 +0000 2025",
  "lang": "en",
  "bookmarkCount": 5,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1993730035266826302",
  "displayTextRange": [
    0,
    279
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "llama_index",
    "url": "https://x.com/llama_index",
    "twitterUrl": "https://twitter.com/llama_index",
    "id": "1604278358296055808",
    "name": "LlamaIndex 🦙",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1967920417760251904/0ytfduMQ_normal.png",
    "coverPicture": "https://pbs.twimg.com/profile_banners/1604278358296055808/1758023766",
    "description": "",
    "location": "",
    "followers": 104427,
    "following": 28,
    "status": "",
    "canDm": false,
    "canMediaTag": true,
    "createdAt": "Sun Dec 18 00:52:44 +0000 2022",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 1453,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 1778,
    "statusesCount": 3636,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1986810928713855235"
    ],
    "profile_bio": {
      "description": "AI Agents for document OCR + workflows\n\nGithub: https://t.co/HC19j7veGE\nDocs: https://t.co/QInqg2yMCJ\nLlamaCloud: https://t.co/yQGTiRSfFL",
      "entities": {
        "description": {
          "urls": [
            {
              "display_url": "github.com/run-llama/llam…",
              "expanded_url": "http://github.com/run-llama/llama_index",
              "indices": [
                48,
                71
              ],
              "url": "https://t.co/HC19j7veGE"
            },
            {
              "display_url": "docs.llamaindex.ai",
              "expanded_url": "http://docs.llamaindex.ai",
              "indices": [
                78,
                101
              ],
              "url": "https://t.co/QInqg2yMCJ"
            },
            {
              "display_url": "cloud.llamaindex.ai",
              "expanded_url": "https://cloud.llamaindex.ai/",
              "indices": [
                114,
                137
              ],
              "url": "https://t.co/yQGTiRSfFL"
            }
          ]
        },
        "url": {
          "urls": [
            {
              "display_url": "llamaindex.ai",
              "expanded_url": "https://www.llamaindex.ai/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/epzefqPT9Z"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "additional_media_info": {
          "monetizable": false
        },
        "display_url": "pic.twitter.com/xq0boCUIjm",
        "expanded_url": "https://twitter.com/llama_index/status/1993730035266826302/video/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "id_str": "1993729979352485888",
        "indices": [
          280,
          303
        ],
        "media_key": "13_1993729979352485888",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARurJtpdWhAAAAA=",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAAECgABG6sm2l1aEAAAAA==",
            "media_key": "13_1993729979352485888"
          }
        },
        "media_url_https": "https://pbs.twimg.com/amplify_video_thumb/1993729979352485888/img/vVvR-QmckTCGs8Cc.jpg",
        "original_info": {
          "focus_rects": [],
          "height": 720,
          "width": 1184
        },
        "sizes": {
          "large": {
            "h": 720,
            "w": 1184
          }
        },
        "type": "video",
        "url": "https://t.co/xq0boCUIjm",
        "video_info": {
          "aspect_ratio": [
            74,
            45
          ],
          "duration_millis": 26233,
          "variants": [
            {
              "content_type": "application/x-mpegURL",
              "url": "https://video.twimg.com/amplify_video/1993729979352485888/pl/tF4lEzuz4jzPCNiv.m3u8?tag=14"
            },
            {
              "bitrate": 288000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/1993729979352485888/vid/avc1/444x270/0D3ceO1zDAV82P5b.mp4?tag=14"
            },
            {
              "bitrate": 832000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/1993729979352485888/vid/avc1/592x360/XWEbAmL-h3kCMEZZ.mp4?tag=14"
            },
            {
              "bitrate": 2176000,
              "content_type": "video/mp4",
              "url": "https://video.twimg.com/amplify_video/1993729979352485888/vid/avc1/1184x720/KZIqQR2X_s6QJIOl.mp4?tag=14"
            }
          ]
        }
      },
      {
        "display_url": "pic.twitter.com/xq0boCUIjm",
        "expanded_url": "https://twitter.com/llama_index/status/1993730035266826302/video/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "features": {
          "large": {},
          "orig": {}
        },
        "id_str": "1993730032771178496",
        "indices": [
          280,
          303
        ],
        "media_key": "3_1993730032771178496",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwAAQoAARurJubNWqAACgACG6sm52IbMD4AAA==",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAABCgABG6sm5s1aoAAKAAIbqybnYhswPgAA",
            "media_key": "3_1993730032771178496"
          }
        },
        "media_url_https": "https://pbs.twimg.com/media/G6sm5s1aoAAnu9e.jpg",
        "original_info": {
          "focus_rects": [
            {
              "h": 672,
              "w": 1200,
              "x": 0,
              "y": 0
            },
            {
              "h": 676,
              "w": 676,
              "x": 52,
              "y": 0
            },
            {
              "h": 676,
              "w": 593,
              "x": 94,
              "y": 0
            },
            {
              "h": 676,
              "w": 338,
              "x": 221,
              "y": 0
            },
            {
              "h": 676,
              "w": 1200,
              "x": 0,
              "y": 0
            }
          ],
          "height": 676,
          "width": 1200
        },
        "sizes": {
          "large": {
            "h": 676,
            "w": 1200
          }
        },
        "type": "photo",
        "url": "https://t.co/xq0boCUIjm"
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "urls": [
      {
        "display_url": "llamaindex.ai/blog/extractin…",
        "expanded_url": "https://www.llamaindex.ai/blog/extracting-repeating-entities-from-documents?utm_source=socials&utm_medium=li_social",
        "indices": [
          1025,
          1048
        ],
        "url": "https://t.co/Zrghh3ZZ7N"
      }
    ]
  },
  "quoted_tweet": null,
  "retweeted_tweet": null,
  "isLimitedReply": false,
  "article": null
}