@vllm_project
Thanks to @AI21Labs for tracking down a silent uint32 overflow in vLLM's Mamba-1 CUDA kernel and contributing the fix. Root cause: `uint32_t` stride × cache_index overflows silently at scale. Fix merged in #35275. The debugging story is worth a read. 🔗 https://t.co/S4XBnEn1uv