@stevibe
Introducing HermesAgent-20, a new Bench Pack for BenchLocal. 20 scenarios extracted straight from the Hermes Agent source code, run against a REAL Hermes instance. The actual workload you'd put your model through. Why I built BenchLocal in the first place: most benchmarks are too abstract. We use local LLMs for practical work, and finding the right model for YOUR task efficiently is the single most important thing, especially when you're constrained to what fits on your machine. BenchLocal is a framework: providers, models, side-by-side comparison, all in one UI. Bench Packs are the unit of testing: ToolCall-15 and BugFind-15 shipped first, and when I launched the BenchLocal 0.1.0, added StructOutput, ReasonMath, InstructFollow, DataExtract. Now, HermesAgent-20 is the newest. Bench Packs install like VS Code extensions. The SDK is open, write your own, share it, grow the ecosystem. Here's the goal: a community-built, practical evaluation layer for the local LLM space. Early numbers on HermesAgent-20: > GLM 5.1 — 85 > Gemma4 31B — 83 > Qwen3.5 27B — 79 > MiniMax M2.7 — 76 Upgrade to the latest BenchLocal to install HermesAgent-20 (SDK update required).