kv-cache - Tags

The LLM Context Tax: 13 Ways to Stop Burning Money on Wasted Tokens

MP-65 2026-02-11 · Nicolas Bustamante (@nicbstme)

The 'Context Tax' in AI brings triple penalties: cost, latency, & reduced intelligence. Nicolas Bustamante's 13 Fintool techniques cut agent token bills by up to 90%. A real-money guide for optimizing AI context, covering KV cache, append-only context, & 200K token pricing.

Inside LLM Inference: KV Cache & the Memory Nightmare (Series 2/3)

GP-32 2026-02-05 · @dejavucoder on bearblog

Part 1 taught you how to save money. Part 2 explains why those tricks work. From the two stages of LLM inference (prefill/decode) to KV cache fundamentals to the GPU memory crisis that makes naive caching fall apart at scale. (Part 2 of 3)

prompt-caching llm-inference