@omarsar0
Hardware-aware design insight Through grid search at fixed KV cache size, they show generation speed tracks KV cache more than parameter count. Different head/dimension settings hold throughput roughly constant while improving accuracy. https://t.co/3DYXMubK1o