Squeezing Every Drop of Performance: Ditching Python for Metal Shaders to Run Large Models Locally

Developer @danveloper shares their experience running Qwen3.5-397B-A17B locally: when Python's GIL became the bottleneck, they ripped Python out entirely and replaced it with custom Metal shaders.

Inside Claude Code's Prompt Caching — The Entire System Revolves Around the Cache

Anthropic engineer Thariq shared hard-won lessons about prompt caching in Claude Code: system prompt ordering is everything, you can't add or remove tools mid-conversation, switching models costs more than staying, and compaction must share the parent's prefix. They even set SEV alerts on cache hit rate. If you're building agentic products, this is a masterclass in real-world caching.