by Cohere For AI
Cohere For AI's multilingual research family. Aya 23 + Aya Expanse cover 23+ languages with explicit balance — the strongest open-weight multilingual chat models for underserved languages (Arabic, Korean, Hebrew, Vietnamese).
Start with Aya Expanse 8B at Q4_K_M via Ollama — fits on single RTX 3060 12GB at 5 GB VRAM. Aya is the only open-weight family that covers 101 languages with native-level quality — it was trained on the Aya Dataset (513M prompts across 114 languages) and Aya Collection (multilingual instruction data). If your use case is non-English text generation, Aya Expanse 8B outperforms Llama 3.1 8B and Qwen 3 8B on low-resource languages by 30-50% on native-speaker evaluations. For higher quality, Aya Expanse 32B Q4 (20 GB) fits on RTX 4090 24 GB. Skip Aya 23 35B — the Aya Expanse generation outperforms it in every language category. Aya uses Apache 2.0 license — no commercial restrictions, no MAU cap, full dataset available for reproduction.
For single-user local: Ollama + aya-expanse:8b Q4_K_M on RTX 3060 12GB or Apple M3 via MLX-LM. Aya uses standard Llama-compatible dense transformer architecture — any engine that runs Llama runs Aya. For multi-user serving: vLLM 0.6.0+ with AWQ 4-bit on 2× L4 24 GB — deploy separate instances per language group if traffic patterns vary by region. For translation pipelines: pair Aya Expanse with faster-whisper for speech-to-text in 101 languages — the tokenizer handles multilingual input natively. Aya's tokenizer is a Cohere Command R-derived 256K vocab — it is more token-efficient for non-English languages than Llama's English-optimized vocab. For deployment in regulated multilingual environments (government, healthcare, legal), Aya's Apache 2.0 license + full dataset transparency makes it the strongest compliance choice.
Models in this family with our verdicts
Verify Aya runs on your specific hardware before committing money.