@Modular
TileTensor is Mojo's new tensor type for GPU kernels. The short version: fully static layouts = 8-byte runtime footprint = less register pressure. We saw 5% throughput gains on @AMD MI300X MHA just from the type change. Our Part 1 blog post covers the design and how it compares to CuTe. Part 2 will cover the Mojo internals that made it possible. https://t.co/0iaHqi0fda