llm-inference - Tags

Prompt Caching Money-Saving Guide: Your API Bill Can Lose a Zero (Series 1/3)

SP-31 2026-02-05 · @dejavucoder on bearblog

An AI engineer stuffed user-specific data into the system prompt, watched his bill double, and learned his lesson. Plus six practical tips to consistently hit prompt cache. (Part 1 of 3)

prompt-caching cost-optimization

Inside LLM Inference: KV Cache & the Memory Nightmare (Series 2/3)

SP-32 2026-02-05 · @dejavucoder on bearblog

Part 1 taught you how to save money. Part 2 explains why those tricks work. From the two stages of LLM inference (prefill/decode) to KV cache fundamentals to the GPU memory crisis that makes naive caching fall apart at scale. (Part 2 of 3)

prompt-caching kv-cache