🐦 Twitter Post Details

Viewing enriched Twitter post

@jerryjliu0

A core retrieval idea that will lead to better results for your LLM QA system is decoupling embedding representations from raw text chunks (s/o @md_rumpf for inspiration). ✂️ There’s actually different ways to take advantage of this idea - and we’ll show how all of these are possible with @llama_index 👇 1️⃣ Embed a summary -> link to more documents associated with the text. ✅ This can help retrieve relevant documents at a high-level before retrieving chunks, vs. retrieving chunks directly (that might be in irrelevant documents). 2️⃣ Embed a sentence -> link to a window around the sentence. ✅ This allows for finer-grained retrieval of relevant context (embedding giant chunks leads to “lost in the middle” problems), but also ensures enough context for LLM synthesis. Guides 📗: 1️⃣ is possible with our recursive retriever, or our out of the box document summary index: Recursive Retriever: https://t.co/HmF2Dib6ho Document Summary Index: https://t.co/HjheQ8tV3N 2️⃣ is possible with our SentenceWindow parser + Metadata Sentence Window: https://t.co/3SN5Xt6vrT

🔧 Raw API Response

{
  "user": {
    "created_at": "2011-09-07T22:54:31.000Z",
    "default_profile_image": false,
    "description": "co-founder/CEO @llama_index\n\nEx-ML @robusthq,  AI research @Uber_ATG and ML Eng @Quora. Princeton '17",
    "fast_followers_count": 0,
    "favourites_count": 3386,
    "followers_count": 19874,
    "friends_count": 1103,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 506,
    "location": "",
    "media_count": 509,
    "name": "Jerry Liu",
    "normal_followers_count": 19874,
    "possibly_sensitive": false,
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1283610285031460864/1Q4zYhtb_normal.jpg",
    "screen_name": "jerryjliu0",
    "statuses_count": 2306,
    "translator_type": "none",
    "url": "https://t.co/S7FkTSefQ0",
    "verified": false,
    "withheld_in_countries": [],
    "id_str": "369777416"
  },
  "id": "1693290236363919840",
  "conversation_id": "1693290236363919840",
  "full_text": "A core retrieval idea that will lead to better results for your LLM QA system is decoupling embedding representations from raw text chunks (s/o @md_rumpf for inspiration). ✂️\n\nThere’s actually different ways to take advantage of this idea - and we’ll show how all of these are possible with @llama_index 👇\n\n1️⃣ Embed a summary -> link to more documents associated with the text. ✅ This can help retrieve relevant documents at a high-level before retrieving chunks, vs. retrieving chunks directly (that might be in irrelevant documents).\n\n2️⃣ Embed a sentence -> link to a window around the sentence. ✅ This allows for finer-grained retrieval of relevant context (embedding giant chunks leads to “lost in the middle” problems), but also ensures enough context for LLM synthesis.\n\nGuides 📗:\n\n1️⃣ is possible with our recursive retriever, or our out of the box document summary index:\nRecursive Retriever: https://t.co/HmF2Dib6ho\nDocument Summary Index: https://t.co/HjheQ8tV3N\n\n2️⃣ is possible with our SentenceWindow parser + Metadata Sentence Window: \nhttps://t.co/3SN5Xt6vrT",
  "reply_count": 8,
  "retweet_count": 83,
  "favorite_count": 393,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [
    {
      "id_str": "1609093208637677569",
      "name": "Max Rumpf",
      "screen_name": "md_rumpf",
      "profile": "https://twitter.com/md_rumpf"
    }
  ],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/F3_GPuFa4AAPL00.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/jerryjliu0/status/1693290236363919840",
  "created_at": "2023-08-20T15:53:46.000Z",
  "#sort_index": "1693290236363919840",
  "view_count": 70448,
  "quote_count": 2,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://twitter.com/jerryjliu0/status/1693290236363919840"
}