@vllm_project
📢vLLM v0.13.0 is now available. Huge thanks to the community. Highlights: Engine core: compile_ranges for selective kernel compilation, PrefixLM support for FlexAttention + TritonAttention, and CUDA graphs for 3D Triton attention. Plus: xxHash option for prefix caching, chunked prefill for ALL pooling tasks, and Model Runner V2 updates (min-p sampling, logits NaN detection).