qwen
8B parameters
Commercial OK
Reviewed May 2026Qwen 3 Embedding 8B
Qwen 3 family embedding model. Apache 2.0 with strong multilingual coverage.
License: Apache 2.0·Released Jun 5, 2025·Context: 32,768 tokens
Overview
Qwen 3 family embedding model. Apache 2.0 with strong multilingual coverage.
Strengths
- Apache 2.0
- Multilingual
- Qwen 3 base
Weaknesses
- Larger than BGE-M3 — pick by VRAM budget
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| FP16 | 16.0 GB | 20 GB |
Get the model
HuggingFace
Original weights
huggingface.co/Qwen/Qwen3-Embedding-8B
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Qwen 3 Embedding 8B.
Frequently asked
What's the minimum VRAM to run Qwen 3 Embedding 8B?
20GB of VRAM is enough to run Qwen 3 Embedding 8B at the FP16 quantization (file size 16.0 GB). Higher-quality quantizations need more.
Can I use Qwen 3 Embedding 8B commercially?
Yes — Qwen 3 Embedding 8B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.
What's the context length of Qwen 3 Embedding 8B?
Qwen 3 Embedding 8B supports a context window of 32,768 tokens (about 33K).
Source: huggingface.co/Qwen/Qwen3-Embedding-8B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Compare hardware
Buyer guides
Before you buy
Verify Qwen 3 Embedding 8B runs on your specific hardware before committing money.