๐Ÿฆ Twitter Post Details

Viewing enriched Twitter post

@Yihe__Deng

Large Vision Language Models are prone to object hallucinations โ€“ how to cost-efficiently address this issue? ๐Ÿš€ Introducing MARINE: a training-free, API-free framework to tackle object hallucinations. Joint work with an amazing team @linxizhao4 @WeitongZhang and @QuanquanGu! arXiv: https://t.co/Lg3NUIaNaw Incorporating a pre-trained object grounding vision encoder, MARINE enriches the visual context of LVLMs and controls the text generation via classifier-free guidance (CFG) specifically designed for the multi-modal setting. MARINE corrects hallucinations without extra fine-tuning or accessing advanced LLMs ๐Ÿค– Compatible with any vision model, we showcase its effectiveness using DEtection TRansformer (DETR) as the object grounding vision encoder in our study. ๐Ÿ“Š Tested on six widely-recognized LVLMs with MSCOCO, MARINE outperforms current methods in reducing hallucinations, verified by the commonly used CHAIR and POPE metrics. ๐Ÿงช Our ablation studies shed light on how varying guidance strengths affect MARINE's performance and generations. We provide concrete examples demonstrating how this guidance tweaks the LVLMs' output logits. ๐Ÿ” Check the detail [1/N]

๐Ÿ”ง Raw API Response

{
  "user": {
    "created_at": "2021-11-21T00:55:36.000Z",
    "default_profile_image": false,
    "description": "ML PhD student @UCLA under @QuanquanGu | Prev. Applied Scientist Intern @AWS | LLM, Multi-modal, Deep learning theory",
    "fast_followers_count": 0,
    "favourites_count": 694,
    "followers_count": 1468,
    "friends_count": 1161,
    "has_custom_timelines": false,
    "is_translator": false,
    "listed_count": 12,
    "location": "",
    "media_count": 19,
    "name": "Yihe Deng",
    "normal_followers_count": 1468,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1462223072203722756/1690857445",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1669048383418556416/VfgvIKS-_normal.jpg",
    "screen_name": "Yihe__Deng",
    "statuses_count": 128,
    "translator_type": "none",
    "url": "https://t.co/dBR44pKvkX",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "1462223072203722756"
  },
  "id": "1757909873491345563",
  "conversation_id": "1757909873491345563",
  "full_text": "Large Vision Language Models are prone to object hallucinations โ€“ how to cost-efficiently address this issue? ๐Ÿš€ Introducing MARINE: a training-free, API-free framework to tackle object hallucinations.\n\nJoint work with an amazing team @linxizhao4 @WeitongZhang and @QuanquanGu!\n\narXiv: https://t.co/Lg3NUIaNaw\n\nIncorporating a pre-trained object grounding vision encoder, MARINE enriches the visual context of LVLMs and controls the text generation via classifier-free guidance (CFG) specifically designed for the multi-modal setting. MARINE corrects hallucinations without extra fine-tuning or accessing advanced LLMs\n\n๐Ÿค– Compatible with any vision model, we showcase its effectiveness using DEtection TRansformer (DETR) as the object grounding vision encoder in our study.\n\n๐Ÿ“Š Tested on six widely-recognized LVLMs with MSCOCO, MARINE outperforms current methods in reducing hallucinations, verified by the commonly used CHAIR and POPE metrics.\n\n๐Ÿงช Our ablation studies shed light on how varying guidance strengths affect MARINE's performance and generations. We provide concrete examples demonstrating how this guidance tweaks the LVLMs' output logits.\n\n๐Ÿ” Check the detail [1/N]",
  "reply_count": 4,
  "retweet_count": 45,
  "favorite_count": 166,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [
    {
      "id_str": "1570083475423920128",
      "name": "Linxi Zhao",
      "screen_name": "linxizhao4",
      "profile": "https://twitter.com/linxizhao4"
    },
    {
      "id_str": "1250290990373457927",
      "name": "Weitong ZHANG",
      "screen_name": "WeitongZhang",
      "profile": "https://twitter.com/WeitongZhang"
    },
    {
      "id_str": "901303999529312256",
      "name": "Quanquan Gu",
      "screen_name": "QuanquanGu",
      "profile": "https://twitter.com/QuanquanGu"
    }
  ],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GGVZNJGaMAEp9g9.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/Yihe__Deng/status/1757909873491345563",
  "created_at": "2024-02-14T23:29:08.000Z",
  "#sort_index": "1757909873491345563",
  "view_count": 20284,
  "quote_count": 1,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://twitter.com/yihe__deng/status/1757909873491345563"
}