@PyTorch
NVIDIA Blackwell introduces Cluster Launch Control to enable dynamic scheduling, allowing kernels to launch as many threadblocks as needed while improving load balance through hardware-driven work distribution. Our latest blog shows how to write TLX kernels that use CLC in practice and introduces TLX as the first DSL in the industry with dedicated support for Cluster Launch Control. By Daohang Shi, Hongtao Yu, and Manman Ren š https://t.co/gBWPcW50z5 #PyTorch #GPU #AIInfrastructure #OpenSource