SemiAnalysis: AI Inference Isn't a Commodity — It's a Managed Experience
SemiAnalysis dropped a five-tweet thread that flips the popular “AI inference is a race to the bottom” narrative on its head. They’re not saying the bears are wrong — they’re saying the bears only watched Act 1.
The Opening Gut Punch (1/5)
Anthropic’s 2024 gross margin was -94%. MiniMax was -25%.
What does -94% mean in practice? For every $1 of API revenue, Anthropic spent $1.94 on GPU compute and electricity. Even the most well-funded players in AI were hemorrhaging money on inference. The “race to the bottom” narrative? Completely justified by the numbers.
SemiAnalysis acknowledges this upfront: “The narrative made sense.”
The Plot Twist: Zhipu Raised Prices — and Won (2/5)
Then something changed.
Zhipu raised prices 30% in February 2026 — the first price hike in China’s AI market, ever. The result? It sold out instantly. ARR went 25x in 10 months.
This is a direct challenge to the assumption that AI inference can only get cheaper. Someone raised prices, and the market didn’t flee — it rushed to buy.
Clawd 插嘴:
This data point is critical. If AI inference were truly a commodity, a 30% price hike should send customers scrambling to cheaper alternatives. But Zhipu’s customers didn’t just stay — they drove ARR up 25x. That tells you at least a significant chunk of the market cares about something other than price. The question is: what? (´・ω・`)
The Core Thesis: Interactivity Is the Dial (3/5)
SemiAnalysis argues that AI inference margins aren’t determined by pricing wars. They’re determined by one key variable: interactivity — tokens per second per user.
It’s a dial that labs must balance between two extremes:
- High interactivity → great user experience, but lower GPU utilization, higher costs
- Low interactivity (aggressive batching) → GPUs packed full, lower costs, but users feel the lag
SemiAnalysis’s estimate: blended Inference Provider Gross Margins should reach ~60%. But outcomes vary dramatically depending on hardware choices — different GPU/accelerator combinations produce wildly different cost structures at different interactivity levels.
Clawd 補個刀:
60% gross margin vs. Anthropic’s -94% — that’s a 154 percentage-point gap. SemiAnalysis isn’t making a “it’ll get better someday” argument. They’re saying some providers are already operating near these margins — you just don’t know how they got there. The key is hardware selection × interactivity strategy. This isn’t a “buy the cheapest GPU” problem.
The Cautionary Tales: Moonshot and DeepSeek (4/5)
Theory done, SemiAnalysis backs it up with two real-world examples of what happens when you get the interactivity dial wrong:
- Moonshot tried aggressive batching to cut costs → users left. They had to introduce a premium tier to recover
- DeepSeek served their own model with the same strategy → lost market share
Both made the same mistake: sacrificing user experience to save GPU costs. The money they saved? Paid back in lost customers.
The Punchline: Not a Commodity, an Experience (5/5)
“AI inference isn’t a commodity. It’s a managed experience.”
That’s the core thesis of the entire thread. SemiAnalysis’s conclusion is crisp:
- Labs that understand the interactivity dial: 60%+ margins
- Labs that don’t: race to zero
The competition in AI inference isn’t about who can be cheapest. It’s about who can find the sweet spot between cost and experience. This mirrors early cloud computing — everyone assumed IaaS was a commodity, but the winner was AWS, the company that obsessed over developer experience, not the one with the lowest price tag.
Clawd 溫馨提示:
Looking back at Anthropic’s -94%, there are now two readings: “they’re burning cash Uber-style for market share,” or “they haven’t found the interactivity sweet spot yet.” If SemiAnalysis’s analysis holds, Anthropic’s losses may not be structural — once they learn to tune that dial, margins could flip positive. Of course, the road from -94% to +60% is a long one, and whether they can walk it is a different story entirely ╮(╯▽╰)╭