🐦 Twitter Post Details

Viewing enriched Twitter post

@danielhanchen

Bug fixes & analysis for Qwen 2.5: 1. Pad_token should NOT be <|endoftext|> Inf gens 2. Base <|im_start|> <|im_end|> are untrained 3. PCA on embeddings has a BPE hierarchy 4. YaRN 128K extended context from 32B 5. Fixed versions + 128K GGUFs: https://t.co/gHMS1CeFLF Details: 1. Pad token bug - for finetuning, never use pad_token = EOS - this will result in infinite generations since finetuning will ignore them. Base model also has a chat template - remove this. @UnslothAI versions fixed them 2. Untrained tokens issues. Do NOT use the Qwen 2.5 chat template for the base version - <|im_start|>, <|im_end|> are untrained since Norm(<im_x>, pad_token) is close to 0. Instruct version have them trained. 3. PCA on embeddings for Base and Instruct show a BPE hierarchy. Less frequent tokens are obvious since they're ordered by ID. PCA shows <|im_x|> moving away from being untrained. Same phenomenon for Llama & more models. 4. Uploaded native 128K extended YaRN GGUFs for Coder 0.5B all the way until 32B to https://t.co/ZoGyVKiLFX. Use the 128K version for long contexts. Use the 32B native version for general chats. Also, Unsloth can finetune 14B in a free Colab! Conversational style finetuning: https://t.co/NcPZiB0Wj9 Kaggle 14B notebook: https://t.co/3Jr44emNqM Unsloth can also finetune the 72B variants in a 48GB card!

View on Twitter

📊 Media Metadata

{
  "data": [
    {
      "id": "",
      "type": "photo",
      "url": null,
      "media_url": "https://pbs.twimg.com/media/GcNmFAGasAAKC8R.jpg",
      "media_url_https": null,
      "display_url": null,
      "expanded_url": null
    }
  ],
  "score": 0.86,
  "scored_at": "2025-08-09T13:46:07.552922",
  "import_source": "network_archive_import",
  "media": [
    {
      "type": "photo",
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1856442699689414970/media_0.jpg?",
      "filename": "media_0.jpg",
      "original_url": "https://pbs.twimg.com/media/GcNmFAGasAAKC8R.jpg"
    }
  ],
  "storage_migrated": true
}

🔧 Raw API Response

{
  "user": {
    "created_at": "2016-04-05T14:34:16.000Z",
    "default_profile_image": false,
    "description": "Building @UnslothAI. Finetune train LLMs faster. LLMs bug hunter. OSS package https://t.co/aRyAAgKOR7. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.",
    "fast_followers_count": 0,
    "favourites_count": 3573,
    "followers_count": 15637,
    "friends_count": 1448,
    "has_custom_timelines": false,
    "is_translator": false,
    "listed_count": 208,
    "location": "San Francisco",
    "media_count": 196,
    "name": "Daniel Han",
    "normal_followers_count": 15637,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/717359704226172928/1676039688",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1624054272676532224/UNv4ONME_normal.jpg",
    "screen_name": "danielhanchen",
    "statuses_count": 1392,
    "translator_type": "none",
    "url": "https://t.co/cJWeGFSsdO",
    "verified": true,
    "withheld_in_countries": [],
    "id_str": "717359704226172928"
  },
  "id": "1856442699689414970",
  "conversation_id": "1856442699689414970",
  "full_text": "Bug fixes & analysis for Qwen 2.5:\n\n1. Pad_token should NOT be <|endoftext|> Inf gens\n2. Base <|im_start|> <|im_end|> are untrained\n3. PCA on embeddings has a BPE hierarchy \n4. YaRN 128K extended context from 32B\n5. Fixed versions + 128K GGUFs: https://t.co/gHMS1CeFLF\n\nDetails:\n1. Pad token bug - for finetuning, never use pad_token = EOS - this will result in infinite generations since finetuning will ignore them. Base model also has a chat template - remove this. @UnslothAI versions fixed them\n\n2. Untrained tokens issues. Do NOT use the Qwen 2.5 chat template for the base version - <|im_start|>, <|im_end|> are untrained since Norm(<im_x>, pad_token) is close to 0. Instruct version have them trained.\n\n3. PCA on embeddings for Base and Instruct show a BPE hierarchy. Less frequent tokens are obvious since they're ordered by ID. PCA shows <|im_x|> moving away from being untrained. Same phenomenon for Llama & more models.\n\n4.  Uploaded native 128K extended YaRN GGUFs for Coder 0.5B all the way until 32B to https://t.co/ZoGyVKiLFX. Use the 128K version for long contexts. Use the 32B native version for general chats.\n\nAlso, Unsloth can finetune 14B in a free Colab! Conversational style finetuning: https://t.co/NcPZiB0Wj9\nKaggle 14B notebook: https://t.co/3Jr44emNqM\n\nUnsloth can also finetune the 72B variants in a 48GB card!",
  "reply_count": 17,
  "retweet_count": 64,
  "favorite_count": 493,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [
    {
      "url": "https://t.co/DY3NvI2F2o",
      "expanded_url": "http://huggingface.co/unsloth",
      "display_url": "huggingface.co/unsloth"
    }
  ],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/GcNmFAGasAAKC8R.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/danielhanchen/status/1856442699689414970",
  "created_at": "2024-11-12T21:03:25.000Z",
  "#sort_index": "1856442699689414970",
  "view_count": 48235,
  "quote_count": 8,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://x.com/danielhanchen/status/1856442699689414970"
}