benchmarks
2 articles
Grok 4.20 Is Here: Cheap, Honest, But Still Not Top of the Class
xAI released Grok 4.20 Beta, scoring 48 on the Artificial Analysis Intelligence Index — up 6 points from Grok 4. Pricing dropped significantly ($2/$6 vs $3/$15), and it achieved the lowest hallucination rate ever tested. But overall intelligence still trails the frontier of 57, held by Gemini 3.1 Pro Preview and GPT-5.4.
Anthropic Exposes AI Benchmarks' Dirty Secret — Leaderboard Gaps Might Just Mean 'Bigger VM'
Anthropic found that agentic coding benchmark scores can swing by up to 6 percentage points based on hardware configuration alone — often more than the gap between top models on leaderboards. Next time someone claims a 2-3% lead, ask them what VM they ran on.