Poor Paul's Benchmark
Find the absolute limit of your local AI hardware
without the enterprise budget.
~/poor-pauls-benchmark
$ git clone https://github.com/paulplee/poor-pauls-benchmark.git
$ cd poor-pauls-benchmark && uv sync
$ python ppb.py all suites/my_gpu.toml
Phase 1 — VRAM Cliff
iter 1: n_ctx= 66,560 ✓ pass → [66,561 .. 131,071] (2.3s)
iter 2: n_ctx= 98,815 ✗ OOM → [66,561 .. 98,814] (0.8s)
iter 3: n_ctx= 82,687 ✓ pass → [82,688 .. 98,814] (3.1s)
✓ Max safe context: 90,111 tokens
Phase 2 — Sweep
✓ [1/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=8192 139.2 tok/s (14.1s)
✓ [2/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=16384 95.7 tok/s (18.3s)
✓ [3/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=32768 62.4 tok/s (24.6s)
...
Phase 3 — Publish
✓ Results uploaded to Hugging Face
Leaderboard
Ranked model × GPU combos by throughput, latency, and efficiency.
ViewInsights
Interactive charts — throughput scaling, latency tails, GPU comparisons, and more.
ViewArticles
Benchmark writeups, hardware deep dives, and quantization analysis.
ViewBenchmark your own hardware
$ git clone https://github.com/paulplee/poor-pauls-benchmark.gitData last updated: April 16, 2026 at 12:34 AM