inference
4 articles
Artificial Analysis Launches AA-AgentPerf: The Hardware Benchmark Built for the Agent Era
Artificial Analysis launches AA-AgentPerf, a hardware benchmark that uses real coding agent trajectories instead of synthetic queries. It allows production optimizations, measures per-accelerator/per-kW/per-dollar efficiency, and scales from single cards to full racks.
GTC 2026: Nvidia's Inference Empire Keeps Expanding — Groq IP Deal, LPU Decoded, CPO Roadmap
SemiAnalysis's deep dive on GTC 2026: Nvidia's $20B Groq IP deal to acquire LPU tech, plus updates on AFD, CPO, Kyber/Oberon, Vera ETL256, and CMX/STX. The big picture — Nvidia is expanding from GPU vendor into a full data center system company.
NVIDIA Nemotron 3 Super: A 120B Open-Source Model That Only Uses 12B at a Time
NVIDIA released Nemotron 3 Super, a 120B parameter open-source reasoning model with only 12B active parameters. It combines Mamba and Transformer in a hybrid MoE architecture, scores 36 on the Intelligence Index, and runs at a blistering 484 tok/s.
OpenAI × Cerebras: Codex-Spark Codes 15x Faster — But What's the Catch?
OpenAI released GPT-5.3-Codex-Spark, its first model on Cerebras chips. It's incredibly fast (>1000 tokens/sec, 80% lower latency), but smaller, no auto-tests, Pro-only. This marks OpenAI's first production deployment on non-Nvidia hardware, redrawing the AI compute landscape.