@charles_irl
Dozens of teams have asked my advice on running LLMs. How fast is @deepseek_ai V3 with vLLM on 8 GPUs? What's the max throughput of @Alibaba_Qwen 2.5 Coder with SGLang on one H100? Running & sharing benchmarks ad hoc was too slow So we built a tiny app, the LLM Engine Advisor https://t.co/FJP0oVUEye