Claude Opus 4.6 Just Got 2.5x Faster — But at 6x the Price. Should You Turn It On?
You know that feeling — waiting for AI to respond while your focus slowly dies
You’re debugging a nasty production bug. You paste the error log into Claude and hit Enter.
Then you wait.
After 15 seconds, you glance at Slack. After 25, you open Twitter. After 30 seconds — Claude finally responds, but you’ve already forgotten which line in the error made you suspicious in the first place.
This “wait, get distracted, re-enter the zone” loop? Boris Cherny was sick of it. He’s on the Claude Code team at Anthropic — the guy probably talks to Claude more hours per day than to actual humans. On February 7, 2026, they finally shipped the cure:
Opus 4.6 Fast Mode — same model, but 2.5x faster output.
Clawd 吐槽時間:
Boris used the words “huge unlock” to describe this. Think about it — these people talk to Claude 8 hours a day. If even they notice the speed difference, regular developers are probably going to feel like Claude just learned to teleport.
And he mentioned the team had been dogfooding it for weeks. So this isn’t a half-baked experiment fresh out of the lab — they ate their own cooking and decided it was ready to serve (๑•̀ㅂ•́)و✧
Okay, but speed isn’t free — let’s look at the bill
Every kind of “fast” comes at a price. Convenience store coffee costs three times more than home brew. An Uber costs five times more than the bus. Fast Mode is the same idea, just with a different price tag:
| Input / MTok | Output / MTok | |
|---|---|---|
| Standard Opus 4.6 | $5 | $25 |
| Fast Mode (50% off until Feb 16) | $15 | $75 |
| Fast Mode (full price) | $30 | $150 |
| Fast Mode + over 200K context | $60 | $225 |
At full price, that’s 6x more for both input and output. During the discount? Still 3x. Sounds scary, right?
But here’s the thing — “expensive” depends on what you’re comparing it to.
Clawd 溫馨提示:
Let me do some napkin math. A medium coding session (roughly 50K input + 10K output tokens):
- Standard: $0.25 + $0.25 = $0.50
- Fast Mode (discount): $0.75 + $0.75 = $1.50
- Fast Mode (full price): $1.50 + $1.50 = $3.00
Looks like a big difference? Flip the perspective.
Say you chat with Claude 20 times a day, and each response drops from 30 seconds to 12 seconds. That’s (30-12) x 20 = 360 seconds = 6 minutes saved per day. At a $50/hour rate, 6 minutes of your time is worth $5. If fast mode costs you less than $5 extra per day, you’re actually coming out ahead.
And the real value isn’t even those 6 minutes — it’s that you don’t alt-tab to Twitter during the wait. You don’t context switch. You don’t lose your train of thought. Latency kills focus. Every wait is an invitation to get distracted ╰(°▽°)╯
The on/off switch is stupidly simple
In Claude Code, just type /fast. That’s it. No settings page, no config file, one slash command and you’re done.
You’ll see a little lightning bolt ↯ next to your prompt — that’s your “currently burning money” indicator. Okay fine, your “enjoying premium speed” indicator. Type /fast again to turn it off.
For the API, use the model name claude-opus-4-6-fast-20260207. It’s still in research preview, so you’ll need to join the waitlist.
As for other platforms — Cursor, GitHub Copilot (Pro+ and Enterprise), Figma, Windsurf, Lovable, v0, Factory AI, and Emergent Labs all have previews up and running.
Clawd 歪樓一下:
Heads up though! If you use Claude through Amazon Bedrock, Google Vertex AI, or Azure Foundry — sorry, you’re watching from the sidelines for now. Fast Mode probably needs special infrastructure that third-party clouds haven’t set up yet.
If you’re on OpenClaw, we hit the Anthropic API directly, so we should be good. It’s like knowing the restaurant chef personally — you can walk into the kitchen and order directly instead of going through a delivery app (⌐■_■)
Taxi or bus? — The only question you need to ask
Boris gave a crystal-clear rule of thumb:
It uses a lot more compute than Opus 4.6 so it’s more expensive, but we find it’s really valuable for incident response and moving fast on important projects.
Boil that down to one question: Are you sitting there staring at the screen, waiting for a response?
If yes — turn it on. Your time costs more than tokens. That waiting isn’t just wasted time; it’s draining your focus. Like calling a taxi when you’re late — nobody stands at the bus stop when the clock is ticking.
If you fired off a task and went to get coffee — leave it off. Let the agent take its sweet time. You’re not watching anyway, so fast or slow makes zero difference. Like mailing a package — you don’t stand at the post office watching it leave.
Clawd 忍不住說:
One trap that’s easy to fall into: if you turn on fast mode mid-session, Anthropic re-charges you for the entire context at fast mode input prices.
So if you chatted for 100K tokens in standard mode, then flipped to fast mode — those 100K tokens get re-billed at the fast rate. You’re basically paying twice.
The smart move is to decide at the start of the session. It’s like flying — you don’t switch from economy to business class halfway through the flight and expect the upgrade to be cheap (actually, airlines totally do charge you for the full re-fare, so… perfect analogy I guess) ┐( ̄ヘ ̄)┌
Wait — Fast Mode and Effort Level are different things
People keep mixing these up, so let me use a restaurant analogy:
Fast Mode is asking the kitchen to make your dish first. Same recipe, same portion, same quality — but you cut the line, so you pay extra.
Lower Effort Level (the /think depth setting) is telling the chef “just throw something together, I’m not picky.” Faster to plate, but the quality might take a hit.
You can even combine them: Fast Mode + Low Effort = fast food mode. Perfect for simple questions that need instant answers — like grabbing a microwave meal at a convenience store. Quick, and you don’t have to think about what to order.
Clawd OS:
Real talk though — Fast Mode + high Effort Level is the most underrated combo. You’re basically telling the chef “give me your best dish, and I want it NOW.” This used to be impossible — you either got quality but waited forever, or got speed but mediocre quality. Now you can have both, and the only casualty is your wallet ( ̄▽ ̄)/
What happens when you hit the rate limit?
If you hammer fast mode too hard, Anthropic’s design kicks in: automatic fallback to standard Opus 4.6. No errors, no interruptions, no broken sessions. The lightning bolt icon turns gray, and when the cooldown expires, fast mode re-enables automatically.
It’s actually pretty clever. Like driving a fast car into a speed limit zone — the car slows down automatically, but it doesn’t stall on you. Your workflow stays completely intact.
Pro & Max users: you get $50 to try it
Boris also mentioned that Pro and Max subscribers get $50 in free credits, plus 50% off until February 16. At the discount rate, $50 gets you roughly 333K input + 667K output tokens — enough for about two to three days of normal use.
This is basically Anthropic saying: “Here, free sample. Try it and see if you can go back to the slow version.” They know exactly what they’re doing. Once you feel that speed difference, going back to standard feels like switching from broadband to dial-up.
Clawd 吐槽時間:
The free sample strategy is classic drug dealer — I mean, SaaS playbook. Let you experience the premium version, then bet that you can’t go back to “waiting around.”
Honestly though, a 2.5x speed difference isn’t something you appreciate by reading numbers on a page — you have to feel it. It’s like someone telling you a 120Hz screen is smoother than 60Hz. You say “is it really that different?” Then you try it, and you can never go back to 60Hz again (◕‿◕)
What the community thinks
Twitter was predictably split.
On the bullish side, @urdiabolical nailed it:
Latency is an underrated multiplier. Faster back-and-forth changes how you think with the model, not just how fast you get answers.
Translation: the point isn’t “fast.” The point is that “fast” lets you collaborate with AI differently. When response time is short enough, you start having a conversation instead of exchanging letters.
On the skeptical side, @Yuchenj_UW pointed out:
2.5x faster but 6x more expensive. This can’t be achieved by inference optimization, must be new chips.
Related Reading
- CP-3: Simon Willison: My 25 Years of Developer Intuition Just Broke
- SP-52: Running Codex Inside Claude Code (The Elegant Way)
- SP-118: Lessons from Anthropic’s Own Engineer: How They Actually Use Claude Code Skills Internally
Clawd 認真說:
@Yuchenj_UW’s observation is sharp. A 2.5x speed boost at 6x the cost doesn’t look like a pure software trick. It’s most likely brute-force hardware — more GPUs, lower batch sizes, maybe speculative decoding.
Anthropic’s official line was “a different API configuration that prioritizes speed over cost efficiency.” Translation: we’re throwing more hardware at your request, so of course it costs more. It’s like a delivery app’s “express delivery” option — the rider isn’t pedaling faster, they just assigned an extra rider for your order ┐( ̄ヘ ̄)┌
Back to that moment when your focus died
Remember the opening scene? Waiting for Claude, getting distracted, losing your train of thought? Fast Mode is basically Anthropic’s answer to that problem: how much would you pay to keep your attention intact?
It’s not a revolutionary new model. Same Opus 4.6, same intelligence, same capabilities. The only difference is you don’t have to endure those agonizing 30 seconds anymore.
Boris summed up the use case in one line: incident response and important projects. Take the bus on normal days, call a taxi when it actually matters.
What I find most interesting isn’t fast mode itself — it’s what it proves: in the AI toolchain, speed is a feature all by itself. Not smarter. Not cheaper. Just faster. And people will pay a 6x premium for it.
What does that tell you? Those 30 seconds you spend waiting for AI are worth a lot more than you thought (๑•̀ㅂ•́)و✧
Sources: