Windows LLM install failed — fix the driver → CUDA → Python chain
Why first-time Windows AI installs fail, how to fix each link in the driver-CUDA-Python chain, and the specific download links that actually work.
Diagnostic order — most likely first
NVIDIA driver missing or outdated
Run `nvidia-smi` in Command Prompt. If 'not recognized,' the driver isn't installed. If driver version < 535, CUDA 12.x won't work.
Download the latest Game Ready or Studio Driver from nvidia.com/drivers. Use the clean-install option during setup. Reboot after install.
CUDA Toolkit not installed or missing from PATH
Run `nvcc --version` in Command Prompt. If 'not recognized,' the CUDA Toolkit isn't on your PATH. PyTorch ships its own CUDA libs, but compile-time tools (bitsandbytes, flash-attention, llama.cpp build) need the full toolkit.
Install CUDA Toolkit 12.4 from developer.nvidia.com/cuda-downloads. During install, check 'Add to system PATH.' After install, open a new terminal and verify `nvcc --version` shows 12.4.
Python installed from Microsoft Store (known-broken for AI workloads)
Run `where python`. If the path contains `WindowsApps`, you're using the Store version. This Python can't access GPU libraries correctly and has registry permission issues.
Uninstall the Store version in Settings > Apps. Download Python 3.11 from python.org (NOT 3.12 — PyTorch / CUDA compatibility is stickiest on 3.11). During install, check 'Add Python to PATH' and 'Disable path length limit.'
Visual Studio Build Tools missing (compile-time deps fail)
`pip install` fails with `error: Microsoft Visual C++ 14.0 or greater is required` or linker errors during wheel builds. Common when installing bitsandbytes, flash-attention, or llama.cpp Python bindings.
Download Visual Studio Build Tools from visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2022. In the installer, select 'Desktop development with C++' workload. This adds the MSVC compiler, Windows SDK, and CMake.
Antivirus / Windows Defender blocking pip or model downloads
`pip install` hangs or model downloads fail with connection-reset errors. Windows Defender real-time protection sometimes quarantines .dll files in venv/Lib/site-packages.
Add your project folder and Python install directory as exclusions in Windows Defender (Settings > Privacy & Security > Virus & threat protection > Manage settings > Exclusions). Temporarily disable real-time protection during the install, then re-enable.
WSL not installed or misconfigured (when WSL-based path is used)
Following a guide that assumes WSL2 (e.g., CUDA-on-WSL path). `wsl --status` shows default version 1 or WSL not present.
Run `wsl --install -d Ubuntu` from an admin PowerShell. Then `wsl --set-default-version 2`. Within Ubuntu, install CUDA-on-WSL (the WSL-specific .deb, not the Windows .exe). This path bypasses most Windows-path headaches.
Frequently asked questions
Why does Python from python.org work but the Microsoft Store version doesn't for AI?
The Store version runs in a sandbox with restricted registry access and a different execution alias system. CUDA libraries and PyTorch's GPU runtime need direct access to the system DLL search path, which the Store sandbox blocks. Always use the official python.org installer.
Do I even need the CUDA Toolkit installed if PyTorch ships its own CUDA?
For pure PyTorch inference, no — `pip install torch` includes the CUDA runtime libraries. But the moment you need to compile anything (bitsandbytes, flash-attention, llama.cpp from source, any CUDA kernel extension), you need the full toolkit with `nvcc`. Install it proactively.
What's the single most reliable Python version for Windows AI work right now?
Python 3.11.x (latest patch). PyTorch stable, bitsandbytes, and the majority of the ecosystem target 3.11 first. 3.12 works for pure inference but fails on several compile-time packages. 3.10 is fine but aging out. Pin to 3.11.
Can I run local AI on an AMD GPU on Windows?
Yes, but the path is rougher than NVIDIA. Use LM Studio (has a Windows ROCm fork built in) or Ollama with the AMD-ROCm driver. Avoid trying to compile PyTorch ROCm from source on Windows — prebuilt wheels exist but are fragile. For most Windows users, NVIDIA is the path of least resistance.
Related troubleshooting
Why CUDA OOM happens during local LLM inference and image gen, how to confirm the real cause, and the four real fixes (smaller quant, shorter context, gradient checkpointing, or more VRAM).
PyTorch falsely reporting no CUDA is the most common Python ML setup failure. The cause is almost always: wrong PyTorch wheel for your CUDA version, or a CPU-only build accidentally installed.
When the fix is hardware
A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time: