🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

Lots of papers on LLM hallucinations recently. Here are a few AI papers that caught my attention this week: (Bookmark to read later) Geometry of Concepts in LLMs Examines the geometric structure of concept representations in sparse autoencoders (SAEs) at three scales: 1) atomic-level parallelogram patterns between related concepts (e.g., man:woman::king:queen), 2) brain-like functional "lobes" for different types of knowledge like math/code, 3) and galaxy-level eigenvalue distributions showing a specialized structure in middle model layers. https://t.co/laQ2JxgF1j ----- Distinguishing Ignorance from Error in LLM Hallucinations A method to distinguish between two types of LLM hallucinations: when models lack knowledge (HK-) versus when they hallucinate despite having correct knowledge (HK+). They build model-specific datasets using their proposed approach and show that model-specific datasets are more effective for detecting HK+ hallucinations compared to generic datasets. https://t.co/NHCA0Ny6WQ ----- SimpleQA A challenging benchmark of 4,326 short factual questions adversarially collected against GPT-4 responses. Reports that frontier models like GPT-4o and Claude achieve less than 50% accuracy. Finds that there is a positive calibration between the model stated confidence and accuracy, signaling that they have some notion of confidence. Claims that there is still room to improve the calibration of LLMs in terms of stated confidence. https://t.co/Q4qUxAsDMk ----- The Role of Prompting and External Tools in Hallucination Rates of LLMs Tests different prompting strategies and frameworks aimed at reducing hallucinations in LLMs. Finds that simpler prompting techniques outperform more complex methods. It reports that LLM agents exhibit higher hallucination rates due to the added complexity of tool usage. https://t.co/CuTS2eMqSe ----- Automating Agentic Workflow Generation A novel framework for automating the generation of agentic workflows. It reformulates workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. It efficiently explores the search space using a variant of MCTS, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Experiments across six benchmark datasets demonstrate AFlow’s effectiveness, showing a 5.7% improvement over manually designed methods and a 19.5% improvement over existing automated approaches. AFlow also enables smaller models to outperform GPT-4o on specific tasks at just 4.55% of its inference cost. https://t.co/jkFzGTfKka ----- MrT5 Proposes a more efficient variant of byte-level language models that uses a dynamic token deletion mechanism (via a learned delete gate) to shorten sequence lengths by up to 80% while maintaining model performance. This enables faster inference and better handling of multilingual text without traditional tokenization. MrT5 maintains competitive accuracy with ByT5 on downstream tasks such as XNLI and character-level manipulations while improving inference runtimes. https://t.co/jyzLjWxLyU ----- More awesome papers in our Top ML Papers of the Week tomorrow @dair_ai

Media 1
Media 2
Media 3
Media 4

📊 Media Metadata

{
  "data": [
    {
      "id": "",
      "type": "photo",
      "url": null,
      "media_url": "https://pbs.twimg.com/media/GbY5_ATW0AA-568.jpg",
      "media_url_https": null,
      "display_url": null,
      "expanded_url": null
    },
    {
      "id": "",
      "type": "photo",
      "url": null,
      "media_url": "https://pbs.twimg.com/media/GbY6Rx2X0AA5fzJ.jpg",
      "media_url_https": null,
      "display_url": null,
      "expanded_url": null
    },
    {
      "id": "",
      "type": "photo",
      "url": null,
      "media_url": "https://pbs.twimg.com/media/GbY6l87WEAAhTpV.jpg",
      "media_url_https": null,
      "display_url": null,
      "expanded_url": null
    },
    {
      "id": "",
      "type": "photo",
      "url": null,
      "media_url": "https://pbs.twimg.com/media/GbY6uXgWsAA1RnD.jpg",
      "media_url_https": null,
      "display_url": null,
      "expanded_url": null
    }
  ],
  "score": 1.0,
  "scored_at": "2025-08-09T13:46:07.551307",
  "import_source": "network_archive_import",
  "links_checked": true,
  "checked_at": "2025-08-10T10:31:55.785118",
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1852733583036682710/media_0.jpg?",
      "filename": "media_0.jpg",
      "original_url": "https://pbs.twimg.com/media/GbY5_ATW0AA-568.jpg"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1852733583036682710/media_1.jpg?",
      "filename": "media_1.jpg",
      "original_url": "https://pbs.twimg.com/media/GbY6Rx2X0AA5fzJ.jpg"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1852733583036682710/media_2.jpg?",
      "filename": "media_2.jpg",
      "original_url": "https://pbs.twimg.com/media/GbY6l87WEAAhTpV.jpg"
    },
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1852733583036682710/media_3.jpg?",
      "filename": "media_3.jpg",
      "original_url": "https://pbs.twimg.com/media/GbY6uXgWsAA1RnD.jpg"
    }
  ],
  "storage_migrated": true
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2015-09-04T12:59:26.000Z",
    "default_profile_image": false,
    "description": "Building with AI Agents @dair_ai • Prev: Meta AI, Elastic, Galactica LLM, PhD • I also teach how to build with LLMs, RAG & AI Agents ⬇️",
    "fast_followers_count": 0,
    "favourites_count": 27933,
    "followers_count": 216712,
    "friends_count": 532,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 3688,
    "location": "",
    "media_count": 2656,
    "name": "elvis",
    "normal_followers_count": 216712,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "screen_name": "omarsar0",
    "statuses_count": 12439,
    "translator_type": "regular",
    "url": "https://t.co/JBU5beHQNs",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "3448284313"
  },
  "id": "1852733583036682710",
  "conversation_id": "1852733583036682710",
  "full_text": "Lots of papers on LLM hallucinations recently.\n\nHere are a few AI papers that caught my attention this week:\n\n(Bookmark to read later)\n\nGeometry of Concepts in LLMs\nExamines the geometric structure of concept representations in sparse autoencoders (SAEs) at three scales: 1) atomic-level parallelogram patterns between related concepts (e.g., man:woman::king:queen), 2) brain-like functional \"lobes\" for different types of knowledge like math/code, 3) and galaxy-level eigenvalue distributions showing a specialized structure in middle model layers. https://t.co/laQ2JxgF1j\n\n-----\n\nDistinguishing Ignorance from Error in LLM Hallucinations\nA method to distinguish between two types of LLM hallucinations: when models lack knowledge (HK-) versus when they hallucinate despite having correct knowledge (HK+). They build model-specific datasets using their proposed approach and show that model-specific datasets are more effective for detecting HK+ hallucinations compared to generic datasets. https://t.co/NHCA0Ny6WQ\n\n-----\n\nSimpleQA \nA challenging benchmark of 4,326 short factual questions adversarially collected against GPT-4 responses. Reports that frontier models like GPT-4o and Claude achieve less than 50% accuracy. Finds that there is a positive calibration between the model stated confidence and accuracy, signaling that they have some notion of confidence. Claims that there is still room to improve the calibration of LLMs in terms of stated confidence. https://t.co/Q4qUxAsDMk\n\n-----\n\nThe Role of Prompting and External Tools in Hallucination Rates of LLMs\nTests different prompting strategies and frameworks aimed at reducing hallucinations in LLMs. Finds that simpler prompting techniques outperform more complex methods. It reports that LLM agents exhibit higher hallucination rates due to the added complexity of tool usage. https://t.co/CuTS2eMqSe\n\n-----\n\nAutomating Agentic Workflow Generation\nA novel framework for automating the generation of agentic workflows. It reformulates workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. It efficiently explores the search space using a variant of MCTS, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Experiments across six benchmark datasets demonstrate AFlow’s effectiveness, showing a 5.7% improvement over manually designed methods and a 19.5% improvement over existing automated approaches. AFlow also enables smaller models to outperform GPT-4o on specific tasks at just 4.55% of its inference cost. https://t.co/jkFzGTfKka\n\n-----\n\nMrT5\nProposes a more efficient variant of byte-level language models that uses a dynamic token deletion mechanism (via a learned delete gate) to shorten sequence lengths by up to 80% while maintaining model performance. This enables faster inference and better handling of multilingual text without traditional tokenization. MrT5 maintains competitive accuracy with ByT5 on downstream tasks such as XNLI and character-level manipulations while improving inference runtimes. https://t.co/jyzLjWxLyU\n\n-----\n\nMore awesome papers in our Top ML Papers of the Week tomorrow @dair_ai",
  "reply_count": 11,
  "retweet_count": 155,
  "favorite_count": 607,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GbY5_ATW0AA-568.jpg",
      "type": "photo"
    },
    {
      "media_url": "https://pbs.twimg.com/media/GbY6Rx2X0AA5fzJ.jpg",
      "type": "photo"
    },
    {
      "media_url": "https://pbs.twimg.com/media/GbY6l87WEAAhTpV.jpg",
      "type": "photo"
    },
    {
      "media_url": "https://pbs.twimg.com/media/GbY6uXgWsAA1RnD.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/omarsar0/status/1852733583036682710",
  "created_at": "2024-11-02T15:24:42.000Z",
  "#sort_index": "1852733583036682710",
  "view_count": 48020,
  "quote_count": 4,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/omarsar0/status/1852733583036682710"
}