inference-speed
2 articles
Typewriter vs Editor: Mercury 2 Reinvents LLMs with Diffusion — 5x Faster Reasoning, 4x Cheaper
Inception Labs launches Mercury 2 — the world's first reasoning Diffusion LLM. Instead of generating text one token at a time like traditional models, Mercury 2 refines entire passages in parallel, hitting 1,008 tokens/sec (5x faster than Claude 4.5 Haiku) at 1/4 the price. Backed by Andrew Ng, Karpathy, and Eric Schmidt.
Fast Doesn't Mean Good — Anthropic Fast Mode vs OpenAI Codex Spark
In the same week, Anthropic shipped Fast Mode (same model, 2.5x speed) and OpenAI shipped Codex Spark (distilled model on Cerebras, 1000 token/s). One bets on accuracy, the other on instant interaction. This isn't a speed race — it's a product philosophy showdown.