kv-cache
2 articles
The LLM Context Tax: 13 Ways to Stop Burning Money on Wasted Tokens
The 'Context Tax' in AI brings triple penalties: cost, latency, & reduced intelligence. Nicolas Bustamante's 13 Fintool techniques cut agent token bills by up to 90%. A real-money guide for optimizing AI context, covering KV cache, append-only context, & 200K token pricing.
Inside LLM Inference: KV Cache & the Memory Nightmare (Series 2/3)
Part 1 taught you how to save money. Part 2 explains why those tricks work. From the two stages of LLM inference (prefill/decode) to KV cache fundamentals to the GPU memory crisis that makes naive caching fall apart at scale. (Part 2 of 3)