paged-attention
1 articles
Paged Attention + Prefix Caching: The Ultimate GPU Memory Hack (Series 3/3 Finale)
Operating systems solved memory fragmentation with paging decades ago. vLLM brought that same trick to GPUs, added block hashing and prefix caching, and made prompt caching a reality. Series finale — every puzzle piece clicks into place.