Aya Expanse 32B
Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya has deeper coverage on the long-tail languages.
Positioning
Cohere Aya Expanse 32B is the latest in Cohere For AI's multilingual research lineage — 32 billion parameters dense, instruction-tuned for 23+ languages with explicit balance across Arabic, Chinese, Japanese, Korean, Turkish, Russian, Spanish, French, German, and 14+ others. Released under CC-BY-NC-4.0 (research/non-commercial). The model is trained from a Llama-style base with Cohere's Aya multilingual pretraining + instruction-tuning recipe — the canonical "open-weight 30B-class multilingual model" in 2026.
Strengths
- Multilingual coverage is genuinely best-in-class for the parameter tier. 23+ languages with balanced quality is meaningfully better than Llama 3 / Qwen 3 at the same parameter count, which lean English-heavy.
- Strong on under-served languages. Arabic, Korean, Hebrew, Turkish, Vietnamese — languages where Llama 3 lags meaningfully.
- 32B parameter dense fits cleanly on a single 48 GB GPU at FP16 (RTX 6000 Ada, L40S) or 24 GB at Q4-Q5 (RTX 4090 / RTX 5090).
- Instruction-tuning is conservative and predictable. Doesn't have the "personality" RLHF of Llama 3.x but is reliable for production translation + multilingual chat workflows.
Limitations
- License is non-commercial. CC-BY-NC-4.0 — production commercial deployments require Cohere licensing. Single biggest practical limitation.
- Reasoning is not class-leading. DeepSeek V3 and Qwen 3 dramatically beat Aya on math/code/logic.
- English-only quality is below Llama 3.1 70B / Qwen 3 32B. The multilingual-balanced training trades English performance for cross-language consistency.
- Tool-use / function-calling is basic. Pre-trained for chat, not optimized for agentic workflows.
- No long-context strength. 8K context standard, with degradation at 16K+.
Real-world performance
- vs Llama 3.1 8B / Llama 3.1 70B: Llama wins for English-only at the parameter-equivalent tier. Aya Expanse 32B wins clearly on Arabic/Korean/Japanese/Vietnamese.
- vs Qwen 3 32B: Qwen 3 32B is stronger overall + has Chinese-English balance. Aya Expanse 32B has wider language coverage but weaker per-language depth.
- vs Command R+ 104B: Command R+ is the larger Cohere sibling with retrieval-grounding focus. Aya Expanse 32B is the cheaper-to-serve multilingual chat option.
- vs Google Gemma 2 27B: Comparable parameter tier. Gemma stronger on English; Aya stronger on multilingual.
Should you run this locally?
Yes if you specifically need 30B-class multilingual chat for research / non-commercial use, your target language mix includes underserved languages (Arabic, Korean, Vietnamese, Hebrew, Turkish), and your deployment is research / academic / non-commercial.
No if you need permissive commercial licensing (pick Llama 3.1 70B or Qwen 3 32B), reasoning-heavy workloads (pick DeepSeek/Qwen 3), or English-only workflows (Llama / Qwen win).
How it compares
- vs aya-23-35b: Aya Expanse is the architectural successor with refined instruction-tuning.
- vs aya-23-8b: Aya 8B is the smaller sibling for cheaper inference at lower capability tier.
- vs Command R 35B: Command R is RAG-tuned; Aya is multilingual-tuned. Different specializations.
- vs Google Gemma 2 27B: Gemma stronger English; Aya stronger multilingual.
Run this yourself
- Single 24 GB GPU at Q4-Q5: RTX 4090, RTX 5090, used 3090.
- Single 48 GB workstation at FP16: RTX 6000 Ada, L40S.
- Apple Silicon at FP16: Mac Studio M3 Ultra / MacBook Pro M4 Max (96+ GB).
- vLLM serving: vllm serve CohereForAI/aya-expanse-32b --max-model-len 8192.
- Cloud rental: Runpod / Lambda L40S ~$1.50-2.50/hr.
Overview
Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya has deeper coverage on the long-tail languages.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- 23-language coverage
- Strong multilingual baseline
Weaknesses
- CC-BY-NC license blocks commercial deployment
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| AWQ-INT4 | 19.0 GB | 22 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Aya Expanse 32B.
Frequently asked
What's the minimum VRAM to run Aya Expanse 32B?
Can I use Aya Expanse 32B commercially?
What's the context length of Aya Expanse 32B?
Source: huggingface.co/CohereForAI/aya-expanse-32b
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Aya Expanse 32B runs on your specific hardware before committing money.