@PyTorch
Helion's autotuner has been a powerful tool for optimizing ML kernels, but it came with a challenge: long autotuning sessions that could take 10+ minutes, sometimes even hours. The PyTorch team at Meta set out to solve this bottleneck using machine learning itself 🖇️ Read here how they did it: https://t.co/N8lYGeXuv9 Spoiler alert: Using Likelihood-Free Bayesian Optimization Pattern Search, they achieved a 36.5% reduction in autotuning time for NVIDIA B200 kernels while improving kernel latency by 2.6%. For AMD MI350 kernels, they saw a 25.9% time reduction with 1.7% better latency. Some kernels showed even more dramatic improvements—up to 50% faster autotuning and >15% latency gains. ✍️ Ethan Che, Oguz Ulgen, Maximilian Balandat, Jongsok Choi, Jason Ansel (Meta) #PyTorch #Helion #MachineLearning #BayesianOptimization #OpenSourceAI #Performance