@RedHat_AI
Speculative decoding for Gemma 4 31B (EAGLE-3) A 2B draft model predicts tokens ahead; the 31B verifier validates them. Same output, faster inference. Early release. vLLM main branch support is in progress (PR #39450). Reasoning support coming soon. https://t.co/PoK8zbA7li