@vtabbott_
A thread🧵previewing my paper with @GioeleZardini, covering how to use diagrams to represent algorithms, generate performance models, and derive execution strategies like FlashAttention ~ We use wires to represent axes, dashed lines to separate tuple segments / parallel functions, weaving to map functions, and horizontal placement for composition. This lets us represent FlashAttention with the diagram below. But how do we go from a representation of a mathematical function to an algorithm executed on GPU cores?