Explicit step-by-step reasoning with visible intermediate steps. Useful for transparency and debuggability in agentic workflows.
ollama pull deepseek-r1:14b (~9 GB — distilled reasoning model with explicit CoT).ollama run deepseek-r1:14b — the model outputs a <think> block with its reasoning trace, then the final answer.Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs DeepSeek R1 Distill Llama 8B at 50-80 tok/s or Qwen 7B at 40-60 tok/s. These 7-8B reasoning models handle the "bat and ball" class of trick problems and multi-step arithmetic reliably. For high-school math (GSM8K): 85-90% accuracy. Pair with Ryzen 5 5600 + 32 GB DDR4 + 512 GB NVMe. Total: ~$400-480. At $400, you get reliable chain-of-thought reasoning for everyday problems. For AIME-level competition math, 32B+ is needed.
Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs DeepSeek R1 Distill Qwen 32B at 15-25 tok/s — AIME 50-70% accuracy with visible reasoning traces. For research-grade CoT: Qwen 3 235B MoE on dual RTX 3090 (48 GB, ~$1,600) at 5-10 tok/s — near-frontier reasoning with full transparency. Total: ~$1,800-2,500. Chain-of-thought at the 32B level is transformative — the model catches its own mistakes, backtracks, and explores alternatives in the thinking trace. The 7B→32B jump is the largest qualitative improvement in reasoning.
The mistake: Hiding the thinking trace from users (or not reading it yourself) because "the answer is what matters." Why it fails: The thinking trace IS the value. A correct answer with garbage reasoning is a hallucination that happened to be right. On the next problem, the same model gives a wrong answer — you have no way to know why. The CoT trace shows you whether the model (a) correctly identified the problem type, (b) applied the right formula, (c) made arithmetic errors, (d) caught and fixed its own mistakes. The fix: Always read CoT traces for important problems. Build applications that display the thinking trace alongside the answer. For automated workflows: log the trace for audit. If the model says the ball costs $0.05 with correct algebra → trust. If it says $0.05 because "I recall this is a trick question" → don't trust (it pattern-matched from training, didn't reason). CoT enables trust calibration — you can assess when to trust the model by reading its reasoning. Without CoT, every answer is a coin flip between "reasoned correctly" and "lucky pattern match."
Browse all tools for runtimes that fit this workload.
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
The errors most operators hit when running chain-of-thought reasoning locally. Each links to a diagnose+fix walkthrough.
Verify your specific hardware can handle chain-of-thought reasoning before committing money.