@omarsar0
The Sparse Frontier Explores the efficiency-accuracy trade-offs of sparse attention in Transformer LLMs. Findings: - Efficiency trade-off: For very long sequences, big models with high, sparse attention outperform smaller, dense ones. - Phase sensitivity: Decoding can… https://t.co/agTfnfTUx9