@PyTorch
We’re excited to share Generalized Dot-Product Attention (GDPA) — a production-driven attention kernel designed specifically for large-scale recommendation systems (RecSys). Proposed in our recent paper, GDPA replaces softmax with a flexible activation tailored for real-world RecSys traffic patterns and has been deployed in Meta’s largest recommendation model, GEM. 🔗 Read our latest blog: https://t.co/YxePbndHlP By redesigning attention around production characteristics rather than benchmark assumptions, GDPA achieves 2× forward speedup (1,145 BF16 TFLOPs, ~97% tensor core utilization), 1.6× backward speedup, and up to 3.5× forward speedup vs. FA4 under short K/V settings on NVIDIA B200. This work demonstrates how real production traffic can fundamentally reshape kernel design. ✍ Jiaqi Xu, Han Xu, Junqing Zhou, Devashish Shankar, Xiaoyi (Leo) Liu, Shuqi Yang #PyTorch #OpenSourceAI #GDPA #GEM