Text
machine translation
language translation
multilingual

Translation

Between-language text translation. Multilingual instruction-tuned models handle this competently; specialized translation models exist for very-low-resource languages.

Capability notes

Machine translation quality is evaluated through **automated metrics** (BLEU, COMET, chrF) scored against reference translations, and **human evaluation** measuring adequacy and fluency. LLM-based translation using instruction-tuned models has closed the gap with specialized neural machine translation (NMT) on high-resource pairs (English↔French/German/Spanish/Chinese) and often exceeds NMT on low-resource pairs. **BLEU scores** (0-100): [Llama 3.3 70B](/models/llama-3-3-70b) achieves 35-40 English→German, [Aya Expanse 32B](/models/aya-expanse-32b) 33-38, Google Translate 38-43, DeepL 40-45. The 5-point gap on European pairs is noticeable — human evaluators prefer commercial ~60-70% in A/B tests. For low-resource pairs (English→Swahili, English→Bengali), Aya Expanse often outperforms Google Translate because it was trained on 100+ languages including pairs commercial APIs neglect. **COMET scores** (0-1, better correlation with human judgment): Aya Expanse 32B scores 0.82-0.88 on WMT test sets for high-resource pairs vs DeepL's 0.85-0.92. Gap narrows to 0.01-0.03 for mid-resource pairs (English↔Czech, Turkish). **Model selection**: [Aya Expanse 32B](/models/aya-expanse-32b) is the best general-purpose open-weight multilingual translator — 100+ languages with instruction-following including formality control. [Qwen 3 235B-A22B](/models/qwen-3-235b-a22b) handles complex translation + localization (idioms, cultural references, marketing copy). [Llama 3.3 70B](/models/llama-3-3-70b) strong on Western European pairs, weaker on Asian/African languages. [DeepSeek V4](/models/deepseek-v4) with reasoning mode for ambiguous source text (legal, philosophical). **Specialized NMT vs LLM**: NMT models (Argos Translate, OPUS-MT, Meta's NLLB-200 at 3.3B params) are 10-100× smaller than LLMs, run on CPU at 100-1000× faster throughput. For constrained-domain translation (technical manuals, medical reports), specialized NMT matches or exceeds LLM quality at a fraction of compute. NLLB-200 is the reference NMT baseline for 200 languages.

If you just want to try this

Lowest-friction path to a working setup.

Pull [Aya Expanse 32B](/models/aya-expanse-32b) on [Ollama](/tools/ollama) (`ollama pull aya-expanse:32b`). Best open-weight multilingual instruction model for straightforward translation — covers 100+ languages, understands translation-specific instructions ("translate to informal Japanese"), and fits consumer hardware. At Q4, ~20 GB VRAM — an [RTX 3090](/hardware/rtx-3090) or [RTX 4090](/hardware/rtx-4090) handles it. At Q2-Q3, fits 16 GB cards like [RTX 5070 Ti](/hardware/rtx-5070-ti). Prompt format: "Translate the following [source language] text to [target language]. Preserve formatting, proper nouns, and technical terms. If a term has no direct translation, keep the original and add [explanation in brackets]." For European pairs where quality matters most, [Llama 3.3 70B](/models/llama-3-3-70b) at Q4 on [RTX 4090](/hardware/rtx-4090) produces noticeably better translations — BLEU gap 2-5 points perceptible to native speakers. Requires ~40 GB — fits RTX 4090 with partial offload or [MacBook Pro 16 M4 Max 64GB](/hardware/macbook-pro-16-m4-max). If you don't have >16 GB VRAM, use [llama.cpp](/tools/llama-cpp) with CPU inference. Aya Expanse 32B Q4 on a 16-core CPU with 32 GB RAM translates at 5-10 tok/s — a 5,000-word document in 5-10 minutes. For simplest possible setup: Argos Translate (`pip install argostranslate`) runs on CPU at 500-1000 words/second for supported pairs. Quality lower than LLM but 100× faster on any laptop.

For production deployment

Operator-grade recommendation.

Production translation pipelines choose between LLM-based (higher quality, higher cost) and specialized NMT (lower quality, lower cost) by language pair, domain, and accuracy requirements. **LLM translation pipeline**: Deploy [Aya Expanse 32B](/models/aya-expanse-32b) or [Llama 3.3 70B](/models/llama-3-3-70b) behind [vLLM](/tools/vllm) as translation API. Continuous batching handles 10-50 concurrent requests on a single [RTX 4090](/hardware/rtx-4090). Chunk by paragraph (not sentence — paragraph-level context improves pronoun resolution and discourse coherence 30-50%). **Specialized NMT pipeline**: Deploy NLLB-200 (3.3B params) via [CTranslate2](https://github.com/OpenNMT/CTranslate2). On [RTX 3060 12GB](/hardware/rtx-3060-12gb), translates 500-2000 words/second — 50-200× faster than LLM. For constrained domains, NMT quality is within 1-3 BLEU of LLM. **Hybrid pipeline (pragmatic)**: Route by language pair and domain. High-resource European pairs → Argos Translate/OPUS-MT (fast, cheap). Low-resource or domain-complex → Aya Expanse 32B (slower, costlier). Quality-gate: run COMET quality estimation on NMT output; if score <0.75, re-route to LLM. Catches 80-90% of NMT failures. **Cost economics**: Aya Expanse 32B on RTX 4090 (450W, $0.10-0.15/kWh) translates ~5,000-15,000 words/kWh → $0.003-0.01 per 1,000 words. Google Translate API: $20/million characters. At 10M words/month, self-hosted is $30-100 vs $80 API — modest. At 1B words/month, self-hosted is $3,000-10,000 vs $8,000 API — meaningful. **Quality assurance**: Human evaluation of 1-5% of translations (stratified by language pair and COMET score). Track BLEU/COMET over time per language pair. Maintain terminology database for consistent translation of domain-specific terms.

What breaks

Failure modes operators see in the wild.

- **Code-switching corruption.** Source text containing small amounts of a second language causes the model to switch output language mid-sentence, producing trilingual gibberish. Common in software docs. Mitigation: detect language segments in source, translate each independently with explicit language tags. - **Formality level mismatch.** Wrong register — informal "tu" for formal "vous," casual "du" for "Sie." Translation is word-correct but socially wrong. Mitigation: explicit formality instruction in prompt or post-translation formality classifier. - **Cultural context loss.** Idioms and metaphors translated literally produce nonsense. "It's raining cats and dogs" → literal translation meaningless in target language. Mitigation: explicit localization instruction ("replace idioms with target-language equivalents"). Maintain glossary of idiom pairs per language pair. - **Proper noun mistranslation.** Names, brands get translated when they shouldn't or stay untranslated when they should. Mitigation: NER pre-pass to identify named entities → preserve via placeholder → reinsert after translation. - **Hallucinated additions in empty segments.** Incomplete sentences or placeholder text ("TODO") cause LLMs to "helpfully" generate plausible content. Dangerous in technical docs. Mitigation: mark empty segments with explicit tags; post-process to verify no content added where source was empty. - **RTL language rendering issues.** Arabic/Hebrew/Persian text mixed with LTR text (numbers, URLs) produces incorrect visual ordering. Mitigation: wrap mixed-direction segments in Unicode bidirectional control characters. Test on actual RTL-configured systems. - **Sentence segmentation errors.** Wrong boundary detection (abbreviations, decimal numbers) produces disconnected translations. Chinese/Japanese without explicit boundaries particularly vulnerable. Mitigation: use language-specific segmentation (spaCy, Stanza, ICU BreakIterator), not regex.

Hardware guidance

**Hobbyist ($500-$1,500)**: [RTX 3060 12GB](/hardware/rtx-3060-12gb) or [RTX 4060 Ti 16GB](/hardware/rtx-4060-ti-16gb). Runs Aya Expanse 32B Q2-Q3 (~10-14 GB) or 7-8B LLMs at Q8 for translation. Quality modest at Q2 — BLEU drops 2-4 points from Q4 baseline. CPU+NMT (NLLB-200, Argos) for high-throughput adequate-quality translation. [Apple M4 Pro](/hardware/apple-m4-pro) 24GB runs Aya 32B Q3 — a solid $1,400 translation workstation with unified memory advantage. **SMB ($2,000-$4,000)**: [RTX 4090 24GB](/hardware/rtx-4090) or [RTX 5090 32GB](/hardware/rtx-5090). Aya 32B Q4-Q5 with 16K context — quality sweet spot. 5090 32 GB runs Llama 3.3 70B Q4 entirely in VRAM. Throughput: 50-200 paragraphs/min for 32B, 20-80 for 70B. **Enterprise ($8,000-$25,000)**: [RTX A6000](/hardware/rtx-a6000) 48 GB or [NVIDIA L40S](/hardware/nvidia-l40s) 48 GB for sustained 24/7 serving. 2× [RTX 5090](/hardware/rtx-5090) (64 GB) for tensor-parallel 70B Q8. More VRAM → larger models at higher quantization → directly improves BLEU/COMET. **Frontier ($50,000+)**: [NVIDIA H100 PCIe](/hardware/nvidia-h100-pcie) or [H200](/hardware/nvidia-h200) for [Qwen 3 235B-A22B](/models/qwen-3-235b-a22b) at FP8 — best-in-class multilingual and rare-language performance. Worth it when translation quality directly impacts revenue. **CPU-only viable**: 16-32 core CPUs (Ryzen 9950X, i9-14900K) with 64+ GB RAM run 32B Q4 at 8-15 tok/s via [llama.cpp](/tools/llama-cpp). For overnight batch of 50,000+ pages, CPU is more cost-effective than GPU — trading time for hardware cost.

Runtime guidance

**If you need one-off translations on your machine** → [Ollama](/tools/ollama) with [Aya Expanse 32B](/models/aya-expanse-32b). Zero setup, interactive with format preservation. European pairs: [Llama 3.3 70B](/models/llama-3-3-70b) on Ollama or [LM Studio](/tools/lm-studio). Apple Silicon: [MLX LM](/tools/mlx-lm). **If building production translation API** → [vLLM](/tools/vllm) behind FastAPI. Continuous batching handles 10-50 concurrent requests on single [RTX 4090](/hardware/rtx-4090) with <5s latency/paragraph. Request queuing with priority (interactive before batch). **If speed/cost dominate over quality (constrained domain)** → Specialized NMT via [CTranslate2](https://github.com/OpenNMT/CTranslate2) with OPUS-MT/NLLB-200. 2-4× speedup over raw Transformers. Deploy as primary engine with LLM as quality-escalation fallback (COMET <0.75 triggers LLM re-translation). **If batch document translation (website localization)** → Argos Translate for high-resource pairs at 500-1,000 words/second on CPU. For quality-critical: paragraph-chunked LLM translation via vLLM batch. Pipeline: ingestion → language detection → paragraph segmentation → LLM translation → terminology validation → output assembly. **If real-time chat translation** → Streaming via WebSocket. Aya Expanse 32B on RTX 4090: 200-500ms TTFT for short sentences. Sub-200ms: use NLLB-200 distilled to 600M params. Trade quality for speed. **If glossary-enforced translation (domain terminology)** → vLLM with prompt injection. Maintain terminology database (JSON/SQL) of source→target mappings per language pair. Prepend glossary to prompt. vLLM supports prompt templates with variable substitution for glossary injection. For NMT: constrained decoding to force specific term translations.

Setup walkthrough

  1. Install Ollamaollama pull aya-expanse:8b (~5 GB — Cohere's multilingual model, 23 languages).
  2. ollama run aya-expanse:8b → prompt: "Translate the following English to Japanese: 'The cherry blossoms bloom in early April.'"
  3. First translation in 2-5 seconds. Quality surpasses Google Translate for most language pairs.
  4. For European languages: ollama pull llama3.2:3b (~2 GB, lighter) — handles EN↔DE/FR/ES/IT competently.
  5. For low-resource languages (Swahili, Urdu, Bengali): ollama pull aya-expanse:32b (~20 GB) — the 32B variant has dramatically better low-resource coverage.
  6. Batch: cat phrases.txt | while read line; do ollama run aya-expanse:8b "Translate to French: $line"; done.

The cheap setup

Aya Expanse 8B runs at 40-60 tok/s on a used GTX 1060 6 GB ($60) — translates a paragraph in 2-5 seconds. Llama 3.2 3B runs on any $300 laptop CPU at 20-40 tok/s for major European languages. Translation is VRAM-light for the 3B-8B range. Build: used Dell Optiplex ($150) + GTX 1060 6 GB ($60) + 16 GB RAM ($30). Total: ~$240. For 50+ language pairs with low-resource coverage, the 32B Aya Expanse needs 24 GB — out of $300 range.

The serious setup

Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs Aya Expanse 32B at 25-40 tok/s — near-Google-Translate quality across 23 languages including low-resource pairs. Qwen 2.5 32B (multilingual instruction-tuned) at 40-60 tok/s for Asian language pairs. Pair with Ryzen 7 7700X + 32 GB DDR5 + 1TB NVMe. Total: ~$1,500-1,800. For enterprise translation (100K+ words/day), batch with vLLM — 3-5× throughput improvement. Translation is not VRAM-intensive below 32B.

Common beginner mistake

The mistake: Using an English-centric chat model (Llama 3.1 8B, Mistral 7B) for translation and getting awkward, literal output. Why it fails: English-centric models are trained predominantly on English data — they understand other languages as "vocabulary learned from English explanations" rather than native fluency. Translations come out grammatically correct but stylistically unnatural — like a textbook, not a native speaker. The fix: Use a true multilingual model: Aya Expanse (23 languages, trained on multilingual corpora), Qwen 2.5 (strong Asian language support), or Command R+ (enterprise-grade multilingual). These models produce natural-sounding translations because they learned languages natively, not as a translation task.

Recommended setup for translation

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Reality check

Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.

Common mistakes

  • Buying for spec-sheet VRAM without modeling KV cache + activation overhead
  • Underestimating quantization quality loss below Q4
  • Skipping flash-attention support (real perf gap on long context)
  • Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)

What breaks first

The errors most operators hit when running translation locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle translation before committing money.

Specialized buyer guides
Updated 2026 roundup