Phi-4 Reasoning 14B
Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.
The reasoning specialist of the Phi-4 line — same architecture, tuned with chain-of-thought training. Built for math, code planning, and multi-step problem decomposition in a 14B body that fits in 12 GB VRAM.
Strengths- Reasoning-class quality at 14B — competitive with QwQ 32B on math while using half the VRAM.
- MIT license — the same license clarity as base Phi-4.
- Visible chain-of-thought is well-formatted and useful for verification.
- Always reasons — no toggle. Every prompt eats 2–3× the tokens.
- Verbose intermediate output dominates throughput on simple questions.
- Same narrow knowledge breadth as Phi-4 base.
- Q4_K_M (8.4 GB): 70–85 tok/s decode — but 2–3× tokens per answer
- Q5_K_M (9.9 GB): 60–75 tok/s
- Q8_0 (14.7 GB): 42–52 tok/s
Yes, for dedicated math / code-planning workflows where reasoning quality matters and you want to fit in 12 GB VRAM. No, for general chat — base Phi-4 14B is simpler. For maximum reasoning, jump to QwQ 32B if VRAM allows.
How it compares- vs Phi-4 14B (base) → Reasoning variant wins on hard problems; base wins on throughput for simple ones. Pick by workload.
- vs QwQ 32B → QwQ has stronger absolute reasoning; Phi-4 Reasoning fits in much less VRAM.
- vs Qwen 3 14B with thinking mode → Qwen 3 has the toggle flexibility; Phi-4 Reasoning has slightly cleaner reasoning traces.
- vs DeepSeek R1 Distill Qwen 14B → R1 Distill is more aggressive on reasoning depth; Phi-4 Reasoning is steadier on consistency.
ollama pull phi4-reasoning:14b-q4_K_M
ollama run phi4-reasoning:14b-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4060 Ti 16 GB / 4090
›Why this rating
8.5/10 — Phi-4 with explicit reasoning training. Closes the gap with QwQ 32B at half the VRAM, but always-on chain-of-thought eats throughput on simple prompts. Loses points to base Phi-4 14B for general use.
Overview
Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.
Strengths
- Best 14B reasoner at release
- MIT license
Weaknesses
- Verbose by default
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 8.4 GB | 11 GB |
Get the model
Ollama
One-line install
ollama run phi4-reasoning:14bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Phi-4 Reasoning 14B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Phi-4 Reasoning 14B?
Can I use Phi-4 Reasoning 14B commercially?
What's the context length of Phi-4 Reasoning 14B?
How do I install Phi-4 Reasoning 14B with Ollama?
Source: huggingface.co/microsoft/phi-4-reasoning
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.