@LiorOnAI
Mercury 2 doesn't just make reasoning models faster. It makes them native. Every reasoning model today is built on autoregressive generation, where the model writes one word at a time, left to right, like typing on a keyboard. Each word waits for the previous one to finish. The problem compounds when reasoning depth increases: multi-step agents, voice systems, and coding assistants all need many sequential passes, and each pass multiplies the delay. The industry has spent billions on chips, compression, and serving infrastructure to squeeze more speed from this sequential loop. But you're still optimizing a bottleneck. Mercury 2 uses diffusion instead. It starts with a rough draft of the entire response and refines all the words simultaneously through multiple passes. Each pass improves many tokens in parallel, so one neural network evaluation does far more work. The model can also correct mistakes mid-generation because nothing is locked in until the final pass. This isn't a serving trick or a hardware optimization. The speed comes from the architecture itself. This unlocks workflows that were impractical before: 1. Multi-step agents that run 10+ reasoning loops without compounding latency 2. Voice AI that hits sub-200ms response times with full reasoning enabled 3. Real-time code editors where every keystroke triggers model feedback Mercury 2 runs at 1,000 tokens per second while matching the quality of models that generate 70-90 tokens per second. If this performance holds across model sizes, reasoning stops being a batch process you run overnight and becomes something you embed everywhere. Agent loops become tight enough for interactive debugging. Voice systems feel instant instead of sluggish. Code assistants respond faster than you can move your cursor. The entire category of "too slow for production" collapses.