AMD ROCm: running local AI on Radeon
For: Owners of RX 7900 XTX, 7900 XT, or RX 9070 cards willing to commit to Linux. By the end: A ROCm-backed inference service on Linux running 13-32B-class models, with a pinned driver stack you can re-create.
Be honest with yourself first: ROCm on consumer Radeon is Linux-only in any practical production sense, the official support matrix moves slowly, and a kernel upgrade can ruin your weekend. The reward is a 24GB card (7900 XTX) that actually runs the modern open-weight models for half the price of a 4090. This path walks the eight disciplines that keep that reward from costing you sanity.
Confirm your card is on the supported list
Older cards (RX 6000 series, Vega) work in some configs, but support is research-quality, not production. The current targets are RX 7900 XTX (24GB), RX 7900 XT (20GB), and the new RX 9070-series. If you have a 6800/6900, you can make it work but you're off the paved road; many of the steps below assume RDNA 3 or newer.
Install Ubuntu LTS or Debian Stable, nothing else
ROCm is officially supported against specific Ubuntu LTS and RHEL-family kernels. Arch / Fedora / openSUSE work, sometimes, but you become the QA team. If you want a machine that survives a year of updates, Ubuntu LTS is the right answer. This isn't taste — it's how AMD ships.
Pin the kernel and the driver to a known-good combo
The number-one ROCm pain story: an unattended kernel upgrade left the amdgpu kernel module incompatible with the new kernel headers. The card no longer initializes and you're booting to recovery. Fix: hold the kernel package, do upgrades on a schedule, test before committing.
Record the working combo (kernel X, ROCm Y, driver Z) in a file. When you upgrade later, you'll need it.
Verify ROCm sees the card
Before you try a model, verify the card. rocminfo should list your GPU as a compute device. rocm-smi should report non-zero VRAM and reasonable idle temperature. If either of these fails, no inference framework will work — debug here, not in vLLM.
Common gotcha: a fresh ROCm install requires the user to be in the render and video groups. Check `groups` and re-login if needed.
Run llama.cpp with the ROCm backend
llama.cpp is the most reliable runtime on consumer ROCm. Build with `make GGML_HIP=1` (or fetch the pre-built binaries that include HIP). Run a 7B Q4 model and check tokens-per-second. On a 7900 XTX you should see at least 70-100 tok/s on a 7B Q4 model; lower means a build flag is wrong or the wrong kernels are being selected.
Try vLLM ROCm — and have a fallback ready
vLLM ROCm exists, works, and lags the CUDA path on features and stability. For batched serving, when it works, it's significantly faster than llama.cpp. For interactive single-stream use, the gap is much smaller. Try it; if it fights you for more than an hour, fall back to llama.cpp without shame.
Pick a model that actually fits
7900 XTX 24GB comfortably handles a 32B model in Q4 with 8K-16K context. 7900 XT 20GB is comfortable at 14B and tight at 32B. Don't try to load 70B on a single 24GB card — you can with severe quantization, but the quality cost usually isn't worth it.
Document the working stack and the rebuild path
ROCm setups are not stable across updates the way NVIDIA setups can be. Treat your working configuration as something you'll have to recreate, and write it down accordingly. Three months from now, future-you will be grateful when a kernel update breaks something and the recovery is a matter of following your own document.
Next recommended step
Operator-grade Linux paths for ROCm and CUDA, with the maintenance and observability cross-cuts.