🐦 Twitter Post Details

Viewing enriched Twitter post

@Yihe__Deng

Large Vision Language Models are prone to object hallucinations – how to cost-efficiently address this issue? 🚀 Introducing MARINE: a training-free, API-free framework to tackle object hallucinations. Joint work with an amazing team @linxizhao4 @WeitongZhang and @QuanquanGu! arXiv: https://t.co/Lg3NUIaNaw Incorporating a pre-trained object grounding vision encoder, MARINE enriches the visual context of LVLMs and controls the text generation via classifier-free guidance (CFG) specifically designed for the multi-modal setting. MARINE corrects hallucinations without extra fine-tuning or accessing advanced LLMs 🤖 Compatible with any vision model, we showcase its effectiveness using DEtection TRansformer (DETR) as the object grounding vision encoder in our study. 📊 Tested on six widely-recognized LVLMs with MSCOCO, MARINE outperforms current methods in reducing hallucinations, verified by the commonly used CHAIR and POPE metrics. 🧪 Our ablation studies shed light on how varying guidance strengths affect MARINE's performance and generations. We provide concrete examples demonstrating how this guidance tweaks the LVLMs' output logits. 🔍 Check the detail [1/N]

View on Twitter

🔧 Raw API Response

{
  "user": {
    "created_at": "2021-11-21T00:55:36.000Z",
    "default_profile_image": false,
    "description": "ML PhD student @UCLA under @QuanquanGu | Prev. Applied Scientist Intern @AWS | LLM, Multi-modal, Deep learning theory",
    "fast_followers_count": 0,
    "favourites_count": 694,
    "followers_count": 1468,
    "friends_count": 1161,
    "has_custom_timelines": false,
    "is_translator": false,
    "listed_count": 12,
    "location": "",
    "media_count": 19,
    "name": "Yihe Deng",
    "normal_followers_count": 1468,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1462223072203722756/1690857445",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1669048383418556416/VfgvIKS-_normal.jpg",
    "screen_name": "Yihe__Deng",
    "statuses_count": 128,
    "translator_type": "none",
    "url": "https://t.co/dBR44pKvkX",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "1462223072203722756"
  },
  "id": "1757909873491345563",
  "conversation_id": "1757909873491345563",
  "full_text": "Large Vision Language Models are prone to object hallucinations – how to cost-efficiently address this issue? 🚀 Introducing MARINE: a training-free, API-free framework to tackle object hallucinations.\n\nJoint work with an amazing team @linxizhao4 @WeitongZhang and @QuanquanGu!\n\narXiv: https://t.co/Lg3NUIaNaw\n\nIncorporating a pre-trained object grounding vision encoder, MARINE enriches the visual context of LVLMs and controls the text generation via classifier-free guidance (CFG) specifically designed for the multi-modal setting. MARINE corrects hallucinations without extra fine-tuning or accessing advanced LLMs\n\n🤖 Compatible with any vision model, we showcase its effectiveness using DEtection TRansformer (DETR) as the object grounding vision encoder in our study.\n\n📊 Tested on six widely-recognized LVLMs with MSCOCO, MARINE outperforms current methods in reducing hallucinations, verified by the commonly used CHAIR and POPE metrics.\n\n🧪 Our ablation studies shed light on how varying guidance strengths affect MARINE's performance and generations. We provide concrete examples demonstrating how this guidance tweaks the LVLMs' output logits.\n\n🔍 Check the detail [1/N]",
  "reply_count": 4,
  "retweet_count": 45,
  "favorite_count": 166,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [
    {
      "id_str": "1570083475423920128",
      "name": "Linxi Zhao",
      "screen_name": "linxizhao4",
      "profile": "https://twitter.com/linxizhao4"
    },
    {
      "id_str": "1250290990373457927",
      "name": "Weitong ZHANG",
      "screen_name": "WeitongZhang",
      "profile": "https://twitter.com/WeitongZhang"
    },
    {
      "id_str": "901303999529312256",
      "name": "Quanquan Gu",
      "screen_name": "QuanquanGu",
      "profile": "https://twitter.com/QuanquanGu"
    }
  ],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GGVZNJGaMAEp9g9.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/Yihe__Deng/status/1757909873491345563",
  "created_at": "2024-02-14T23:29:08.000Z",
  "#sort_index": "1757909873491345563",
  "view_count": 20284,
  "quote_count": 1,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://twitter.com/yihe__deng/status/1757909873491345563"
}