🐦 Twitter Post Details

Viewing enriched Twitter post

@ravithejads

Multi-Modal RAG with ColPali as a re-ranker using @llama_index 💡 What is ColPali? ColPali is a model based on Vision Language Models (VLMs). It is an extension of PaliGemma-3B, ColPali generates ColBERT-style multi-vector representations for both text and images. It efficiently indexes documents using their visual features. 🤔 But how can ColPali be used as a re-ranker in a Multi-Modal RAG setup? Using LlamaIndex abstractions, the process is simple and involves five steps: 1️⃣ Extract text and images from the data sources. 2️⃣ Build a Multi-Modal index for both text and images using @cohere Multi-Modal Embeddings. 3️⃣ Retrieve relevant text and images simultaneously using a Multi-Modal Retriever for the given query. 4️⃣ Re-rank text nodes using @cohere re-ranker and image nodes using ColPali. 5️⃣ Generate responses by using the re-ranked text and image nodes with the GPT-4o Multi-Modal LLM. 👉check out the cookbook: https://t.co/RuTAbPy2QS

View on Twitter

🔧 Raw API Response

{
  "user": {
    "created_at": "2010-12-01T14:44:19.000Z",
    "default_profile_image": false,
    "description": "AI Engineer and Developer Advocate at @llama_index (LlamaIndex)\n\nFocused on RAG, agents, and fine-tuning LLMs.",
    "fast_followers_count": 0,
    "favourites_count": 6023,
    "followers_count": 4674,
    "friends_count": 719,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 88,
    "location": "Bangalore, India",
    "media_count": 134,
    "name": "Ravi Theja",
    "normal_followers_count": 4674,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/221757413/1657726537",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1653094130065498126/IggetaTH_normal.jpg",
    "screen_name": "ravithejads",
    "statuses_count": 1194,
    "translator_type": "none",
    "url": "https://t.co/5p2DXaFHxy",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "221757413"
  },
  "id": "1859093416820248701",
  "conversation_id": "1859093416820248701",
  "full_text": "Multi-Modal RAG with ColPali as a re-ranker using @llama_index \n\n💡 What is ColPali?\n\nColPali is a model based on Vision Language Models (VLMs). It is an extension of PaliGemma-3B, ColPali generates ColBERT-style multi-vector representations for both text and images. It efficiently indexes documents using their visual features.\n\n🤔 But how can ColPali be used as a re-ranker in a Multi-Modal RAG setup?\n\nUsing LlamaIndex abstractions, the process is simple and involves five steps:\n\n1️⃣ Extract text and images from the data sources.\n\n2️⃣ Build a Multi-Modal index for both text and images using @cohere Multi-Modal Embeddings.\n\n3️⃣ Retrieve relevant text and images simultaneously using a Multi-Modal Retriever for the given query.\n\n4️⃣ Re-rank text nodes using @cohere re-ranker and image nodes using ColPali.\n\n5️⃣ Generate responses by using the re-ranked text and image nodes with the GPT-4o Multi-Modal LLM.\n\n👉check out the cookbook: https://t.co/RuTAbPy2QS",
  "reply_count": 2,
  "retweet_count": 48,
  "favorite_count": 190,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [
    {
      "id_str": "1604278358296055808",
      "name": "LlamaIndex 🦙",
      "screen_name": "llama_index",
      "profile": "https://twitter.com/llama_index"
    }
  ],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GczTZg7WMAA4kTm.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/ravithejads/status/1859093416820248701",
  "created_at": "2024-11-20T04:36:25.000Z",
  "#sort_index": "1859093416820248701",
  "view_count": 17344,
  "quote_count": 1,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/ravithejads/status/1859093416820248701"
}