🐦 Twitter Post Details

Viewing enriched Twitter post

@rasbt

There's currently a lot of talk about Mistral, but have you seen the new QA-LoRA paper? - LoRA (low-rank adaptation) is awesome because it adapts only a small, low-rank subset of parameters of a base LLM. - QLoRA is awesome because it lowered memory requirements even further by quantizing the base model weights. - QA-LoRA is even more awesome as it takes QLoRA a step further and also quantizes the LoRA (adapter) weights, avoiding a costly conversion of the quantized base model weights back into 16-bit when adding the adapter weights. This concept is summarized in the annotated figure below. A little nitpick: Table 2 shows that QA-LoRA is about 2x faster than QLoRA for fine-tuning. However, a much smaller number of parameters was used for the adapter weights. I believe it would have been fairer to use the same number of parameters for both when comparing their speeds.

View on Twitter

🔧 Raw API Response

{
  "user": {
    "created_at": "2012-10-07T02:06:16.000Z",
    "default_profile_image": false,
    "description": "Machine learning and AI researcher since the early days • coding and working with LLMs @LightningAI • ex-statistics professor @UWMadison",
    "fast_followers_count": 0,
    "favourites_count": 18301,
    "followers_count": 225133,
    "friends_count": 858,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 3321,
    "location": "US     Blog →",
    "media_count": 1533,
    "name": "Sebastian Raschka",
    "normal_followers_count": 225133,
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/865622395/1663716396",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/1661187442043486209/a3E4t1eV_normal.jpg",
    "screen_name": "rasbt",
    "statuses_count": 14219,
    "translator_type": "none",
    "url": "https://t.co/flo7SUQ85P",
    "verified": false,
    "withheld_in_countries": [],
    "id_str": "865622395"
  },
  "id": "1708097036501860655",
  "conversation_id": "1708097036501860655",
  "full_text": "There's currently a lot of talk about Mistral, but have you seen the new QA-LoRA paper? \n\n- LoRA (low-rank adaptation) is awesome because it adapts only a small, low-rank subset of parameters of a base LLM.\n- QLoRA is awesome because it lowered memory requirements even further by quantizing the base model weights.\n- QA-LoRA is even more awesome as it takes QLoRA a step further and also quantizes the LoRA (adapter) weights, avoiding a costly conversion of the quantized base model weights back into 16-bit when adding the adapter weights.\n\nThis concept is summarized in the annotated figure below. \n\nA little nitpick: Table 2 shows that QA-LoRA is about 2x faster than QLoRA for fine-tuning. However, a much smaller number of parameters was used for the adapter weights. I believe it would have been fairer to use the same number of parameters for both when comparing their speeds.",
  "reply_count": 8,
  "retweet_count": 78,
  "favorite_count": 545,
  "hashtags": [],
  "symbols": [],
  "user_mentions": [],
  "urls": [],
  "media": [
    {
      "media_url": "https://pbs.twimg.com/media/F7RhEATWQAA05ss.jpg",
      "type": "photo"
    }
  ],
  "url": "https://twitter.com/rasbt/status/1708097036501860655",
  "created_at": "2023-09-30T12:30:43.000Z",
  "#sort_index": "1708097036501860655",
  "view_count": 86789,
  "quote_count": 4,
  "is_quote_tweet": false,
  "is_retweet": false,
  "is_pinned": false,
  "is_truncated": true,
  "startUrl": "https://twitter.com/rasbt/status/1708097036501860655"
}