cost-optimization - Tags

Anthropic Prompt Caching Deep Dive — Automatic Caching, 1-Hour TTL, and the Gotchas They Don't Tell You

SP-112 2026-03-13 · Anthropic Official Docs

Anthropic's prompt caching got major updates in 2026: Automatic Caching removes manual breakpoint headaches, 1-hour TTL keeps caches alive longer, and the invalidation hierarchy decides what blows up when you change things. Plus our real-world $13.86 billing disaster story.

Cloudflare Launches Markdown for Agents — 80% Token Savings, Stock Surges 13%, the 'Agentic Internet' Is Here

CP-98 2026-02-19 · Cloudflare Blog

Cloudflare's "Markdown for Agents" lets AI request markdown instead of HTML, cutting token usage by 80%. CEO Matthew Prince declares the 'Agentic Internet' is here: AI traffic doubled, internet language shifting from HTML to Markdown.

cloudflare infrastructure markdown ai-agents agentic-internet tokens web

Cut Token Costs by 75%: A Practical Guide to System Prompt Layering

SP-55 2026-02-13 · @ohxiyu

An AI Agent burns 34,500 tokens of system prompt every single conversation turn. The author used layered loading (always-on vs on-demand) plus a dual-model strategy to cut monthly costs from $568 down to $120-150 — a 75% reduction. Full breakdown with real numbers inside.

token-optimization system-prompt agent-architecture context-engineering

The LLM Context Tax: 13 Ways to Stop Burning Money on Wasted Tokens

CP-65 2026-02-11 · Nicolas Bustamante (@nicbstme)

The 'Context Tax' in AI brings triple penalties: cost, latency, & reduced intelligence. Nicolas Bustamante's 13 Fintool techniques cut agent token bills by up to 90%. A real-money guide for optimizing AI context, covering KV cache, append-only context, & 200K token pricing.

context-engineering llm ai-agents prompt-caching kv-cache token-efficiency claude-code

Prompt Caching Money-Saving Guide: Your API Bill Can Lose a Zero (Series 1/3)

SP-31 2026-02-05 · @dejavucoder on bearblog

An AI engineer stuffed user-specific data into the system prompt, watched his bill double, and learned his lesson. Plus six practical tips to consistently hit prompt cache. (Part 1 of 3)

prompt-caching llm-inference