Model crashes mid-inference — debug the actual cause
Mid-inference crashes (segfault, illegal memory access, kernel panic) usually mean VRAM ECC, thermal throttling, PSU instability, or a bad model file. Here's the diagnostic order.
Diagnostic order — most likely first
Model file is corrupted (incomplete download, bad mirror)
Crash is reproducible at the same token / step. Sometimes the runtime reports a hash mismatch.
Re-download the model from the source (HuggingFace direct, official Ollama registry). Verify the SHA256 if the source publishes one. Don't use sketchy mirrors.
GPU thermal throttling / unstable overclock
Crash correlates with high load. `nvidia-smi -q -d TEMPERATURE` shows GPU temp > 85°C right before crash. Or you've manually overclocked.
Reset clocks to stock. Improve case airflow. Underclock VRAM by 100-200 MHz if you have an aggressive AIB card; many crashes that look like 'illegal memory access' are actually VRAM stability issues.
PSU not stable under load (transient power dips)
Crash happens 5-30 seconds into sustained inference. PC may also reboot under load. PSU is undersized or aging.
Check PSU wattage vs total system load. Single 4090 / 5090 needs 850-1000W minimum. Consider a higher-quality PSU (Seasonic Prime, Corsair RMx, EVGA SuperNova). Old PSUs degrade — 5+ year old units are suspect.
VRAM ECC error (used cards, mining cards)
Crashes are random, not load-correlated. `nvidia-smi -q -d ECC` shows non-zero double-bit errors. Common on used / ex-mining 3090s.
If under 100 errors and isolated to one VRAM bank, you can sometimes work around it by underclocking VRAM. If consistent or growing, the card is failing — replace it.
Runtime + driver incompatibility
Crash happens immediately on load. Logs show CUDA error 700 (illegal memory access) before a single token generates.
Update drivers to latest stable. If on the bleeding-edge runtime (vLLM nightly, llama.cpp HEAD), pin to the last release tag. New runtimes occasionally ship CUDA kernels that need newer drivers than you have.
RAM (system) corruption causing GPU memory transfer failures
Crash is random, sometimes during model load (before any inference). System RAM might be the issue, not VRAM.
Run memtest86 overnight. Bad system RAM corrupts model weights as they transfer to GPU, producing CUDA crashes that look like GPU issues.
Frequently asked questions
Is my GPU dying if local AI keeps crashing?
Possibly, but check the cheaper causes first: PSU stability, thermals, model file integrity, drivers. If you've ruled all of those out and `nvidia-smi -q -d ECC` shows growing error counts, the card has a real hardware problem.
Can a used GPU from a mining rig be safe for local AI?
Often yes, with caveats. Mining wears the fans (replaceable) and the thermal pads (replaceable on most cards). It rarely wears VRAM or the GPU die unless the card was overclocked + run hot for years. Buy from sellers willing to demo the card running stress tests; check ECC error counts.
What stress test should I run on a GPU before trusting it for AI?
Run a 30-minute llama.cpp inference loop on a model that fully fits VRAM, monitoring `nvidia-smi -l 1`. Watch for thermal throttling (clocks dropping under sustained load), VRAM errors, or driver resets. If it survives 30 minutes at 95%+ utilization without issues, it'll handle inference.
Related troubleshooting
Why CUDA OOM happens during local LLM inference and image gen, how to confirm the real cause, and the four real fixes (smaller quant, shorter context, gradient checkpointing, or more VRAM).
Ollama silently falls back to CPU when it can't load a model into VRAM. Here's how to confirm the fallback, force GPU usage, and pick a model that actually fits.
ROCm is finicky on consumer AMD GPUs in 2026. Here's the install order, the gfx-version override that fixes 80% of detection failures, and when to give up and use Vulkan.
When the fix is hardware
A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time: