@Modular
Gemma 4 is live on Modular Cloud, day zero, with the fastest performance on both NVIDIA and AMD. Our MAX inference framework delivers 15% higher throughput vs. vLLM on B200, and we’re the only inference provider to ship @googlegemma 4 on a framework we built ourselves. Two multimodal models live now: Gemma 4 31B (dense, 256K context) and 26B A4B (MoE, only 4B params active per pass). Both SOTA on Modular Cloud: https://t.co/moAyXaHm0m Modular Cloud runs on MAX, our inference framework that unifies GPU kernels, graph compilation, and high-performance serving in a single hardware-agnostic stack. New weights to SOTA deployment in days, on two hardware platforms: https://t.co/aaaOhlKLsL