
P-01Python · PyTorch
3-Bit KV Cache Quantization for LLMs (TurboQuant-Inspired)
A from-scratch PyTorch benchmark of TurboQuant-inspired 3-bit KV cache quantization, compressing the cache ~4.9× while holding key reconstruction above 0.999 cosine similarity. Measured across the Qwen2.5 family on consumer hardware.
