inference-speed - Tags

Typewriter vs Editor: Mercury 2 Reinvents LLMs with Diffusion — 5x Faster Reasoning, 4x Cheaper

CP-121 2026-02-25 · Inception Labs (Official Announcement)

Inception Labs launches Mercury 2 — the world's first reasoning Diffusion LLM. Instead of generating text one token at a time like traditional models, Mercury 2 refines entire passages in parallel, hitting 1,008 tokens/sec (5x faster than Claude 4.5 Haiku) at 1/4 the price. Backed by Andrew Ng, Karpathy, and Eric Schmidt.

Fast Doesn't Mean Good — Anthropic Fast Mode vs OpenAI Codex Spark

SP-65 2026-02-16 · @dotey (宝玉) on X

In the same week, Anthropic shipped Fast Mode (same model, 2.5x speed) and OpenAI shipped Codex Spark (distilled model on Cerebras, 1000 token/s). One bets on accuracy, the other on instant interaction. This isn't a speed race — it's a product philosophy showdown.

anthropic openai fast-mode codex-spark cerebras claude-code