all work

Projects

Everything Harsh has designed, built and shipped — click any card for the full story.

P-01Python · PyTorch

3-Bit KV Cache Quantization for LLMs (TurboQuant-Inspired)

A from-scratch PyTorch benchmark of TurboQuant-inspired 3-bit KV cache quantization, compressing the cache ~4.9× while holding key reconstruction above 0.999 cosine similarity. Measured across the Qwen2.5 family on consumer hardware.

PyTorch
Python
LLM

View project

P-02Python

Loops vs. Prompts: When Iterating an LLM Is Worth the Cost

A controlled experiment on when it pays to run a language model in a loop instead of asking once. Four strategies, three tasks, matched compute, and proper statistics, all reproducible from the code and data.

LLMs
Ollama
Qwen2.5

View project