epoch-ai
4 articles
Epoch AI Re-Ran SWE-bench Verified: Better Scores May Mean Better Evaluation Setup, Not Just Better Models
Epoch AI's SWE-bench Verified v2.x aligns model scores with developer reports. Key lesson: benchmark outcomes are heavily influenced by scaffold/tooling quality, environment reliability, and evaluation settings, not just base model capability.
Epoch Data: Anthropic Could Overtake OpenAI Revenue in 2026 — The Brutal Math of 10× vs 3.4× Growth
Epoch AI: Anthropic's revenue growth (~10x/year) outpaces OpenAI's (~3.4x/year) since crossing $B. Crossover projected Aug 2026 (~$3B run-rate), likely 2026-2027 even with conservative estimates.
AI Inference Costs Drop 5-10x Every Year — Epoch AI Has the Receipts to Prove It
Epoch AI researcher Jean-Stanislas Denain challenges Toby Ord's pessimism with data: AI capability cost drops 5-10x annually. A $1M task today could be $100K next year, $10K after. Inference cost is real but temporary.
An Epoch AI Researcher Tested It: How Close Is AI to Taking My Job?
Anson Ho tested AI on real job tasks (web apps, articles, content). AI excels on benchmarks but struggles with real work. Forecast: 2026 safe, 2028-2029 a turning point.