qwen

8B parameters

Commercial OK

Reviewed May 2026

Qwen 3 Embedding 8B

Qwen 3 family embedding model. Apache 2.0 with strong multilingual coverage.

License: Apache 2.0·Released Jun 5, 2025·Context: 32,768 tokens

Overview

Qwen 3 family embedding model. Apache 2.0 with strong multilingual coverage.

Strengths

Apache 2.0
Multilingual
Qwen 3 base

Weaknesses

Larger than BGE-M3 — pick by VRAM budget

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
FP16	16.0 GB	20 GB

Get the model

HuggingFace

Original weights

huggingface.co/Qwen/Qwen3-Embedding-8B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Qwen 3 Embedding 8B.

Frequently asked

What's the minimum VRAM to run Qwen 3 Embedding 8B?

20GB of VRAM is enough to run Qwen 3 Embedding 8B at the FP16 quantization (file size 16.0 GB). Higher-quality quantizations need more.

Can I use Qwen 3 Embedding 8B commercially?

Yes — Qwen 3 Embedding 8B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Qwen 3 Embedding 8B?

Qwen 3 Embedding 8B supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/Qwen/Qwen3-Embedding-8B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify Qwen 3 Embedding 8B runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →