@vllm_project
š vLLM v0.17.0 is here! 699 commits from 272 contributors (48 new!) This is a big one. Highlights: ā” FlashAttention 4 integration š§ Qwen3.5 model family with GDN (Gated Delta Networks) šļø Model Runner V2 maturation: Pipeline Parallel, Decode Context Parallel, Eagle3 + CUDA graphs šļø New --performance-mode flag: balanced / interactivity / throughput š¾ Weight Offloading V2 with prefetching š Elastic Expert Parallelism Milestone 2 š§ Quantized LoRA adapters (QLoRA) now loadable directly