Poor Paul's Benchmark

Find the absolute limit of your local AI hardware without the enterprise budget.

~/poor-pauls-benchmark
$ git clone https://github.com/paulplee/poor-pauls-benchmark.git
$ cd poor-pauls-benchmark && uv sync
$ python ppb.py all suites/my_gpu.toml
Phase 1 — VRAM Cliff
iter 1: n_ctx= 66,560 ✓ pass → [66,561 .. 131,071] (2.3s)
iter 2: n_ctx= 98,815 ✗ OOM → [66,561 .. 98,814] (0.8s)
iter 3: n_ctx= 82,687 ✓ pass → [82,688 .. 98,814] (3.1s)
✓ Max safe context: 90,111 tokens
Phase 2 — Sweep
✓ [1/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=8192 139.2 tok/s (14.1s)
✓ [2/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=16384 95.7 tok/s (18.3s)
✓ [3/6] Qwen3-30B-A3B-UD-Q4_K_M ctx=32768 62.4 tok/s (24.6s)
...
Phase 3 — Publish
✓ Results uploaded to Hugging Face
#1

Leaderboard

Ranked model × GPU combos by throughput, latency, and efficiency.

View

Insights

Interactive charts — throughput scaling, latency tails, GPU comparisons, and more.

View

Articles

Benchmark writeups, hardware deep dives, and quantization analysis.

View
27,530 results208 models3 GPUs1 contributorsdata on Hugging Face

Benchmark your own hardware

$ git clone https://github.com/paulplee/poor-pauls-benchmark.git
View on GitHub

Data last updated: April 16, 2026 at 12:34 AM