@PyTorch
Normalization methods (LayerNorm/RMSNorm) are foundational in modern deep learning models. We evaluate and improve torch.compile performance for LayerNorm/RMSNorm on NVIDIA H100 and B200 to reach near SOTA performance on a kernel-by-kernel basis, whilst providing automatic fusion capabilities with torch.compile for peak e2e performance. 🔗 Read our latest blog from Shunting Zhang, Paul Zhang, Markus Höhnerbach, Elias Ellison, Jason Ansel, and Natalia Gimelshein: https://t.co/ie5UZag3qx Today at PyTorch Conference EU: Lightning Talk: Faster Than SOTA Kernels in Torch.compile With Subgraph Fusions and Custom Op Autotuning - Elias Ellison & Paul Zhang, Meta, 15:40 - 15:50 #PyTorch #torchcompile #OpenSourceAI #PyTorchCon