RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Errors / Build / compile failures / exllamav2 ImportError: cannot import name 'ExLlamaV2' / undefined symbol
Build / compile failures

exllamav2 ImportError: cannot import name 'ExLlamaV2' / undefined symbol

ImportError: cannot import name 'ExLlamaV2' from 'exllamav2'
By Fredoline Eruo · Last verified May 8, 2026

Cause

exllamav2 ships pre-built wheels for specific CUDA + PyTorch + Python combinations. A mismatch on any of those falls back to source compile, which fails (or imports a broken extension) in many environments.

Common forms: undefined symbol errors mean the .so was built against a different PyTorch ABI; ImportError means the C extension didn't build at all and the Python package is half-installed.

Solution

1. Match wheel to environment. The exllamav2 release page lists wheels by CUDA/torch/Python:

# Identify your stack
python -c "import torch; print(torch.version.cuda, torch.__version__)"
python --version

# Pick the matching wheel from https://github.com/turboderp-org/exllamav2/releases
pip install exllamav2 \
  --extra-index-url https://github.com/turboderp-org/exllamav2/releases/download/v0.2.4/exllamav2-0.2.4+cu124.torch2.4.0-cp311-cp311-linux_x86_64.whl

2. Or build from source against your exact PyTorch:

pip uninstall exllamav2 -y
pip install exllamav2 --no-binary exllamav2

Requires CUDA toolkit (nvcc) installed and on PATH.

3. Use TabbyAPI if you just want an OpenAI-compatible server — it bundles exllamav2 with the right wheels and avoids the manual matching:

git clone https://github.com/theroyallab/tabbyAPI && cd tabbyAPI
./start.sh

4. Confirm the version after install:

python -c "from exllamav2 import ExLlamaV2; print('ok')"

Related errors

  • llama.cpp build fails: nvcc not found
  • llama.cpp build fails: nvcc not found / CUDA toolkit missing
  • llama.cpp CUDA build: unsupported GNU version! gcc versions later than X are not supported
  • flash-attn install fails on Windows / no precompiled wheel

Did this fix it?

If your case was different, email support@runlocalai.co with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.