NVIDIA driver mismatch — pin the right driver + toolkit combo
When PyTorch / vLLM / a CUDA app errors on 'CUDA driver version is insufficient' or 'no kernel image,' the host driver is too old (or sometimes too new) for the installed toolkit. Read nvidia-smi's max-CUDA, match it.
Diagnostic order — most likely first
Driver too old for installed CUDA toolkit
`nvidia-smi` upper-right shows 'CUDA Version: 12.0'. PyTorch / vLLM was built against CUDA 12.4. Errors say 'CUDA driver version is insufficient for CUDA runtime version.'
Update the driver. Linux: `sudo apt install nvidia-driver-555` (or distro equivalent). Windows: download from nvidia.com. Reboot. Verify `nvidia-smi` shows CUDA 12.4+. Driver 550+ supports CUDA 12.4.
Driver too new for production PyTorch wheel (rare)
Driver 580+ with PyTorch 2.4 sometimes breaks on edge cases (specific kernels, FlashAttention versions). Errors mention 'CUDA error: invalid device function' or kernel-image-not-supported.
Either pin PyTorch to a tested version: `pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124`. OR roll back the driver to a known-good version (NVIDIA archives at nvidia.com/Download/Find.aspx).
Multiple CUDA toolkits installed (PATH conflict)
`which nvcc` shows e.g. `/usr/local/cuda-11.8/bin/nvcc` but `pip` installed PyTorch wants 12.4. `ldconfig -p | grep cuda` lists conflicting libcuda paths.
Remove old toolkits: `sudo apt remove cuda-11-8 cuda-toolkit-11-8` (and equivalents). Set `CUDA_HOME=/usr/local/cuda-12.4` in your shell profile. Verify `nvcc --version` shows the intended version.
WSL CUDA + Linux native CUDA conflict
User is on WSL2 but installed `nvidia-driver-*` packages inside WSL. The Linux driver fights with the Windows host's passthrough driver, breaking GPU access.
Remove Linux NVIDIA drivers inside WSL: `sudo apt remove --purge nvidia-driver-* nvidia-utils-*`. Keep only `cuda-toolkit-12-x` (the toolkit, not the driver). The Windows host driver provides the GPU; WSL gets only the CUDA libraries.
Container image CUDA newer than host
Docker container starts but fails: 'forward compatibility was attempted on non supported HW' or driver-too-old errors. Container image pinned to CUDA 12.6 but host is on driver supporting CUDA 12.2.
Match container base image to host driver. Use NVIDIA's official images: `nvidia/cuda:12.4.0-runtime-ubuntu22.04` for driver 550+. Or update host driver. Don't run CUDA 12.6 containers on a host that can't support them.
Frequently asked questions
What's the recommended NVIDIA driver + CUDA combo for 2026?
Driver 550+ (Linux) / 555+ (Windows), CUDA 12.4 toolkit, PyTorch 2.5+ with cu124 wheel, Python 3.11. This is the path with broadest ecosystem support — vLLM, Transformers, TensorRT-LLM, FlashAttention all build cleanly against it.
Can I have multiple CUDA versions installed?
Yes, but messy. Each toolkit installs to `/usr/local/cuda-XX.Y`. Set `CUDA_HOME` and `PATH` to point to the version you want active. PyTorch's wheel ships its own CUDA runtime libraries, so pip-installed PyTorch is mostly insulated from system toolkits — the conflict surfaces when you build from source (FlashAttention, custom kernels).
Do I need to reboot after a driver update?
Linux: yes for kernel module reload (or `sudo modprobe -r nvidia && sudo modprobe nvidia` if you can't reboot). Windows: yes (the installer typically requires it). Skip the reboot and you'll see weird half-loaded driver behavior. After reboot, verify with `nvidia-smi` showing the new version.
Related troubleshooting
PyTorch falsely reporting no CUDA is the most common Python ML setup failure. The cause is almost always: wrong PyTorch wheel for your CUDA version, or a CPU-only build accidentally installed.
vLLM ships pre-built wheels against specific CUDA versions. When your system CUDA differs, you get cryptic kernel-image errors. Here's the version matrix and the fix order.
WSL2 doesn't pass the GPU through unless the host driver is right and the kernel is current. Here's the install order that actually works in 2026, and how to confirm passthrough is live before you waste an afternoon.
When the fix is hardware
A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time: