@PyTorch
Improve latency up to 1.68x with NVFP4 and MXFP8 using Diffusers and TorchAO on Blackwell across a suite of different models 🔥. Squeeze out maximum performance with recipes involving selective quantization and regional compilation. 🔗 Read our latest blog from @vkuzo (@Meta) and @RisingSayak (@HuggingFace): https://t.co/QRHwAiOSc5 #PyTorch #TorchAO #MXFP8 #NVFP4 #OpenSourceAI