Chain-of-Thought Reasoning

Explicit step-by-step reasoning with visible intermediate steps. Useful for transparency and debuggability in agentic workflows.

Setup walkthrough

Install Ollama → ollama pull deepseek-r1:14b (~9 GB — distilled reasoning model with explicit CoT).
For explicit chain-of-thought: use the model's native thinking mode. In Ollama, run with ollama run deepseek-r1:14b — the model outputs a <think> block with its reasoning trace, then the final answer.
Prompt: "A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. How much does the ball cost?" The model's trace: "Let x = ball cost. Bat = x + 1.00. Total = x + (x + 1.00) = 2x + 1.00 = 1.10 → 2x = 0.10 → x = 0.05. The intuitive answer is 0.10 but that's wrong because bat would be 1.10, total 1.20. The correct answer is $0.05." Output: "The ball costs $0.05."
First CoT response in 5-15 seconds. The thinking trace is visible (proves the model reasoned rather than guessed).
For non-reasoning models: use prompt engineering — "Think step by step." or "Let's work through this problem carefully." Standard chat models then simulate CoT (less reliable than native reasoning models).
For self-consistency: run the same prompt 5 times → majority vote on the answer. Improves accuracy 5-15% on complex problems.

The cheap setup

Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs DeepSeek R1 Distill Llama 8B at 50-80 tok/s or Qwen 7B at 40-60 tok/s. These 7-8B reasoning models handle the "bat and ball" class of trick problems and multi-step arithmetic reliably. For high-school math (GSM8K): 85-90% accuracy. Pair with Ryzen 5 5600 + 32 GB DDR4 + 512 GB NVMe. Total: ~$400-480. At $400, you get reliable chain-of-thought reasoning for everyday problems. For AIME-level competition math, 32B+ is needed.

The serious setup

Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs DeepSeek R1 Distill Qwen 32B at 15-25 tok/s — AIME 50-70% accuracy with visible reasoning traces. For research-grade CoT: Qwen 3 235B MoE on dual RTX 3090 (48 GB, ~$1,600) at 5-10 tok/s — near-frontier reasoning with full transparency. Total: ~$1,800-2,500. Chain-of-thought at the 32B level is transformative — the model catches its own mistakes, backtracks, and explores alternatives in the thinking trace. The 7B→32B jump is the largest qualitative improvement in reasoning.

Common beginner mistake

The mistake: Hiding the thinking trace from users (or not reading it yourself) because "the answer is what matters." Why it fails: The thinking trace IS the value. A correct answer with garbage reasoning is a hallucination that happened to be right. On the next problem, the same model gives a wrong answer — you have no way to know why. The CoT trace shows you whether the model (a) correctly identified the problem type, (b) applied the right formula, (c) made arithmetic errors, (d) caught and fixed its own mistakes. The fix: Always read CoT traces for important problems. Build applications that display the thinking trace alongside the answer. For automated workflows: log the trace for audit. If the model says the ball costs $0.05 with correct algebra → trust. If it says $0.05 because "I recall this is a trick question" → don't trust (it pattern-matched from training, didn't reason). CoT enables trust calibration — you can assess when to trust the model by reading its reasoning. Without CoT, every answer is a coin flip between "reasoned correctly" and "lucky pattern match."

Recommended setup for chain-of-thought reasoning

Recommended hardware

Best GPU for local AI →

All workloads ranked across VRAM tiers.

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Budget build

AI PC under $1,000 →

Best GPU for this task

Best GPU for local AI →

Reality check

Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.

Common mistakes

Buying for spec-sheet VRAM without modeling KV cache + activation overhead
Underestimating quantization quality loss below Q4
Skipping flash-attention support (real perf gap on long context)
Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)

What breaks first

The errors most operators hit when running chain-of-thought reasoning locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle chain-of-thought reasoning before committing money.

Featured models

DeepSeek V4 DeepSeek R1 Distill Llama 8B

Related tasks

Reasoning & Math

Buyer guides

Compare hardware

Troubleshooting

Specialized buyer guides

Updated 2026 roundup

Best free local AI tools (2026) →