Squeezing Every Drop of Performance: Ditching Python for Metal Shaders to Run Large Models Locally

Developer @danveloper shares their experience running Qwen3.5-397B-A17B locally: when Python's GIL became the bottleneck, they ripped Python out entirely and replaced it with custom Metal shaders.