@arankomatsuzaki
Serving Large Language Models on Huawei CloudMatrix384 - Integrates 384 Ascend 910C NPUs, interconnected via an ultra-high-bandwidth, low-latency UB network, optimized for large-scale MoE and distributed KV cache access - DeepSeek-R1 on CloudMatrix-Infer hits 2k tokens/s decode… https://t.co/zZuEAdu7Gn