Model Battle Card

Qwen 3 32B vs Llama 3.3 70B? Phi-4 vs Mistral Small 24B? Stop scrolling Reddit. Pick two — get a 10-row diff with per-row winners and a use-case-weighted overall verdict.

Every row sources from the model catalog. Predicted tok/s comes from VRAM-bandwidth × vendor efficiency × Q4_K_M size (same formula as quant advisor). When measured benchmarks exist for your exact pair we surface them — when they don't, the row gets a confidence chip.

Pick the matchup

URL updates as you change fields — share the result by copying the URL.

Model A

Model B

On hardware (for speed/fit columns)

For use case (weights the overall winner)

Pick two different models to start the battle.

We have 185 models in the catalog.

Where to go from here

Quant Advisor →

Picked the winner? Drill into Q4 vs Q5 vs Q8 on your specific hardware × context combo.

Cost vs Cloud →

See what running the winner locally vs on Claude / GPT-5 / Together would cost at your usage volume.

Stream Visualizer →

Watch both models stream side-by-side at their estimated tok/s on your hardware.

Stack Builder →

Now that you have a model — get the full rig (GPU + runtime + install script) recipe around it.