Skip to content
~/harsh
$ cd ..

all work

Projects

Everything Harsh has designed, built and shipped — click any card for the full story.

3-Bit KV Cache Quantization for LLMs (TurboQuant-Inspired) screenshot
P-01Python · PyTorch

3-Bit KV Cache Quantization for LLMs (TurboQuant-Inspired)

A from-scratch PyTorch benchmark of TurboQuant-inspired 3-bit KV cache quantization, compressing the cache ~4.9× while holding key reconstruction above 0.999 cosine similarity. Measured across the Qwen2.5 family on consumer hardware.

  • PyTorch
  • Python
  • LLM
View project
Loops vs. Prompts: When Iterating an LLM Is Worth the Cost screenshot
P-02Python

Loops vs. Prompts: When Iterating an LLM Is Worth the Cost

A controlled experiment on when it pays to run a language model in a loop instead of asking once. Four strategies, three tasks, matched compute, and proper statistics, all reproducible from the code and data.

  • LLMs
  • Ollama
  • Qwen2.5
View project