optimization - Tags

Squeezing Every Drop of Performance: Ditching Python for Metal Shaders to Run Large Models Locally

MP-205 2026-03-24 · @danveloper on X

Developer @danveloper shares their experience running Qwen3.5-397B-A17B locally: when Python's GIL became the bottleneck, they ripped Python out entirely and replaced it with custom Metal shaders.

Inside Claude Code's Prompt Caching — The Entire System Revolves Around the Cache

GP-73 2026-02-19 · @trq212 on X

Anthropic engineer Thariq shares Claude Code prompt-caching lessons: system prompt order matters, tools cannot change mid-conversation, switching models costs more than staying, and compaction must share the parent's prefix. Real SEV alerts included.

prompt-caching claude-code cost ai-agents