@vllm_project
Diffusion serving is expensive: dozens of timesteps per image, and a lot of redundant compute between adjacent steps. ⚡vLLM-Omni now supports diffusion cache acceleration backends (TeaCache + Cache-DiT) to reuse intermediate Transformer computations — no retraining, minimal quality impact! 🚀Benchmarks (NVIDIA H200, Qwen-Image 1024x1024): TeaCache 1.91x, Cache-DiT 1.85x. For Qwen-Image-Edit, Cache-DiT hits 2.38x! Blog: https://t.co/TiC0WhbgQp Docs: https://t.co/0qatboeIe3 #vLLM #vLLMOmni #DiffusionModels #AIInference