token-optimization
3 articles
/effort Is Not a Model Switcher — It's a Gas Pedal (The Creator of Claude Code Said So)
Claude Code creator Boris Cherny cleared the air directly: every subscriber uses the same Opus 4.6 — there is no secret smarter model. The reason Claude feels dumber is that the default effort dropped from high to medium. One command brings it back.
Claude Code Burning Your Budget? One Setting Saves 60% on Tokens
Most token waste is invisible: Extended Thinking on tasks that don't need it, Opus handling work a Sonnet could do, context filling before you compact. ECC's token-optimization.md combines MAX_THINKING_TOKENS + model routing + strategic compact — author Affaan Mustafa says the savings reach 60-80%.
Cut Token Costs by 75%: A Practical Guide to System Prompt Layering
An AI Agent burns 34,500 tokens of system prompt every single conversation turn. The author used layered loading (always-on vs on-demand) plus a dual-model strategy to cut monthly costs from $568 down to $120-150 — a 75% reduction. Full breakdown with real numbers inside.