Poor Paul's Benchmark

Find the absolute limit of your local AI hardware
without the enterprise budget.

~/poor-pauls-benchmark

$ git clone https://github.com/paulplee/poor-pauls-benchmark.git

$ cd poor-pauls-benchmark && uv sync

$ python ppb.py all suites/my_gpu.toml

Phase 1 — VRAM Cliff

iter 1: n_ctx= 66,560 ✓ pass → [66,561 .. 131,071] (2.3s)

iter 2: n_ctx= 98,815 ✗ OOM → [66,561 .. 98,814] (0.8s)

iter 3: n_ctx= 82,687 ✓ pass → [82,688 .. 98,814] (3.1s)

✓ Max safe context: 90,111 tokens

Phase 2 — Sweep

✓ [1/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=8192 139.2 tok/s (14.1s)

✓ [2/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=16384 95.7 tok/s (18.3s)

✓ [3/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=32768 62.4 tok/s (24.6s)

...

Phase 3 — Publish

✓ Results uploaded to Hugging Face

Leaderboard

Ranked model × GPU combos by throughput, latency, and efficiency.

Interactive charts — throughput scaling, latency tails, GPU comparisons, and more.

Benchmark writeups, hardware deep dives, and quantization analysis.

49,465 results·271 models·6 GPUs·1 contributors·data on Hugging Face

Benchmark your own hardware

$ git clone https://github.com/paulplee/poor-pauls-benchmark.git

Data last updated: July 15, 2026 at 02:20 AM