@opengvlab
Thank AK @_akhaliq for the post. 🔥 Excited to introduce OmniQuant - An advanced open-source algorithm for compressing large language models! 📜 Paper: https://t.co/i62FV71HAg 🔗 Code: https://t.co/oPxDdySKtU 💡 Key Features: 🚀Omnidirectional Calibration: Enables easier weight and activation quantization through block-wise differentiation. 🛠 Diverse Precisions: Supports both weight-only quantization (W4A16/W3A16/W2A16) and weight-activation quantization (W6A6, W4A4). ⚡ Efficient: Quantize LLaMa-2 family (7B-70B) in just 1 to 16 hours using 128 samples. 🤖 LLM Models: Works with diverse model families, including OPT, WizardLM @WizardLM_AI, LLaMA, LLaMA-2, and LLaMA-2-chat. 🔑 Deployment: Offers out-of-the-box deployment cases for GPUs and mobile phones. 🏃Comming Soon: Multi-modal models and CodeLLaMa quantization!