🐦 Twitter Post Details

Viewing enriched Twitter post

@omarsar0

Differential Transformer Proposes a differential attention mechanism that amplifies attention to the relevant context while canceling noise. Differential Transformer outperforms Transformer when scaling up model size and training tokens. The authors claims that since this architecture gets less "distracted" by irrelevant context, it can do well in applications such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers.

View on Twitter

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1843694897020150216/media_0.jpg",
      "type": "photo",
      "original_url": "https://pbs.twimg.com/media/GZYemInWgAAFkLd.jpg",
      "download_date": "2025-08-13T05:53:03.694365",
      "stored_in_supabase": true,
      "format_converted_from_list": true
    }
  ],
  "conversion_date": "2025-08-13T00:32:37.577399",
  "format_converted": true,
  "original_structure": "had_media_only"
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2015-09-04T12:59:26.000Z",
    "default_profile_image": false,
    "description": "Building with AI Agents @dair_ai • Prev: Meta AI, Elastic, Galactica LLM, PhD • I also teach how to build with LLMs, RAG & AI Agents ⬇️",
    "fast_followers_count": 0,
    "favourites_count": 27933,
    "followers_count": 216710,
    "friends_count": 532,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 3689,
    "location": "",
    "media_count": 2656,
    "name": "elvis",
    "normal_followers_count": 216710,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/3448284313/1565974901",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/939313677647282181/vZjFWtAn_normal.jpg",
    "screen_name": "omarsar0",
    "statuses_count": 12439,
    "translator_type": "regular",
    "url": "https://t.co/JBU5beHQNs",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "3448284313"
  },
  "id": "1843694897020150216",
  "conversation_id": "1843694897020150216",
  "full_text": "Differential Transformer\n\nProposes a differential attention mechanism that amplifies attention to the relevant context while canceling noise. \n\nDifferential Transformer outperforms Transformer when scaling up model size and training tokens. \n\nThe authors claims that since this architecture gets less \"distracted\" by irrelevant context, it can do well in applications such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers.",
  "reply_count": 10,
  "retweet_count": 129,
  "favorite_count": 573,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GZYemInWgAAFkLd.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/omarsar0/status/1843694897020150216",
  "created_at": "2024-10-08T16:48:12.000Z",
  "#sort_index": "1843694897020150216",
  "view_count": 39165,
  "quote_count": 5,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/omarsar0/status/1843694897020150216"
}