🐦 Twitter Post Details

Viewing enriched Twitter post

@Tim_Dettmers

This is the most important paper in a long time . It shows with strong evidence we are reaching the limits of quantization. The paper says this: the more tokens you train on, the more precision you need. This has broad implications for the entire field and the future of GPUs🧵 https://t.co/S2kD2Zf6ur

Media 1

📊 Media Metadata

{
  "data": [
    {
      "id": "",
      "type": "photo",
      "url": null,
      "media_url": "https://pbs.twimg.com/media/GcMBdeIWYAAytnX.jpg",
      "media_url_https": null,
      "display_url": null,
      "expanded_url": null
    }
  ],
  "score": 0.91,
  "scored_at": "2025-08-09T13:46:07.552899",
  "import_source": "network_archive_import",
  "links_checked": true,
  "checked_at": "2025-08-10T10:32:00.792566",
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1856338240099221674/media_0.jpg?",
      "filename": "media_0.jpg",
      "original_url": "https://pbs.twimg.com/media/GcMBdeIWYAAytnX.jpg"
    }
  ],
  "storage_migrated": true
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2012-10-10T18:18:30.000Z",
    "default_profile_image": false,
    "description": "Creator of bitsandbytes.Research Scientist @allen_ai and incoming professor @CarnegieMellon.  I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.",
    "fast_followers_count": 0,
    "favourites_count": 3745,
    "followers_count": 34356,
    "friends_count": 923,
    "has_custom_timelines": false,
    "is_translator": false,
    "listed_count": 586,
    "location": "Seattle, WA",
    "media_count": 131,
    "name": "Tim Dettmers",
    "normal_followers_count": 34356,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/872274950/1536159170",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1465151884310683652/v4G_57En_normal.jpg",
    "screen_name": "Tim_Dettmers",
    "statuses_count": 3401,
    "translator_type": "none",
    "url": "https://t.co/H5HgSvHWXO",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "872274950"
  },
  "id": "1856338240099221674",
  "conversation_id": "1856338240099221674",
  "full_text": "This is the most important paper in a long time . It shows with strong evidence we are reaching the limits of quantization. The paper says this: the more tokens you train on, the more precision you need. This has broad implications for the entire field and the future of GPUs🧵 https://t.co/S2kD2Zf6ur",
  "reply_count": 64,
  "retweet_count": 504,
  "favorite_count": 2979,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GcMBdeIWYAAytnX.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/Tim_Dettmers/status/1856338240099221674",
  "created_at": "2024-11-12T14:08:20.000Z",
  "#sort_index": "1856338240099221674",
  "view_count": 667811,
  "quote_count": 64,
  "is_quote_tweet": true,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": false,
  "quoted_tweet": {
    "user": {
      "created_at": "2022-07-06T17:31:04.000Z",
      "default_profile_image": false,
      "description": "math at harvard",
      "fast_followers_count": 0,
      "favourites_count": 36,
      "followers_count": 769,
      "friends_count": 73,
      "has_custom_timelines": false,
      "is_translator": false,
      "listed_count": 15,
      "location": "boston & abu dhabi",
      "media_count": 6,
      "name": "Tanishq Kumar @ NeurIPS 2024",
      "normal_followers_count": 769,
      "possibly_sensitive": false,
      "profile_banner_url": "https://pbs.twimg.com/profile_banners/1544735653828624384/1731344846",
      "profile_image_url_https": "https://pbs.twimg.com/profile_images/1856018320752807936/9785QNNR_normal.jpg",
      "screen_name": "tanishqkumar07",
      "statuses_count": 20,
      "translator_type": "none",
      "url": "https://t.co/LD18pD2k9Y",
      "verified": true,
      "withheld_in_countries": [],
      "id_str": "1544735653828624384"
    },
    "id": "1856045600355352753",
    "conversation_id": "1856045600355352753",
    "full_text": "[1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! We formulate scaling laws for precision, across both pre and post-training https://t.co/QLmNOV39Wk. TLDR;\n\n- Models become harder to post-train quantize as they are overtrained on lots of data, so that eventually more pretraining data can be actively harmful if quantizing post-training! \n- The effects of putting weights, activations, or attention in varying precisions during pretraining are consistent and predictable, and fitting a scaling law suggests that pretraining at high (BF16) and next-generation (FP4) precisions may both be suboptimal design choices!\n\nJoint work with @ZackAnkner @bfspector  @blake__bordelon  @Muennighoff  @mansiege  @CPehlevan  @HazyResearch @AdtRaghunathan.",
    "reply_count": 21,
    "retweet_count": 155,
    "favorite_count": 842,
    "hashtags": [],
    "symbols": [],
    "user_mentions": [],
    "urls": [
      {
        "url": "https://t.co/8FC4g2fSRb",
        "expanded_url": "https://arxiv.org/pdf/2411.04330",
        "display_url": "arxiv.org/pdf/2411.04330"
      }
    ],
    "media": [
      {
        "media_url": "https://pbs.twimg.com/media/GcH1RBoWwAAQp1q.jpg",
        "type": "photo"
      }
    ],
    "url": "https://twitter.com/tanishqkumar07/status/1856045600355352753",
    "created_at": "2024-11-11T18:45:29.000Z",
    "#sort_index": "1856338240099221800",
    "view_count": 726174,
    "quote_count": 39,
    "is_quote_tweet": false,
    "is_retweet": false,
    "is_pinned": false,
    "is_truncated": true
  },
  "startUrl": "https://x.com/tim_dettmers/status/1856338240099221674"
}