RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Tasks/Audio/Dubbing & Translation
Audio
audio translation
voice dubbing

Dubbing & Translation

Translating audio across languages while preserving speaker voice. Combines STT → translation → cloned-voice TTS.

Setup walkthrough

  1. Install the pipeline components:

    • STT: pip install faster-whisper (transcribe source audio)
    • Translation: ollama pull aya-expanse:8b (translate transcript)
    • TTS with cloning: pip install f5-tts (speak translation in original voice)
  2. Pipeline script:

# Step 1: Transcribe source audio
from faster_whisper import WhisperModel
stt = WhisperModel("large-v3", device="cuda")
segments, _ = stt.transcribe("source.mp3")
transcript = " ".join([s.text for s in segments])

# Step 2: Translate
import ollama
resp = ollama.chat(model="aya-expanse:8b", messages=[{
    "role": "user",
    "content": f"Translate to Spanish: {transcript}"
}])
translated = resp["message"]["content"]

# Step 3: Synthesize with cloned voice
from f5_tts import F5TTS
tts = F5TTS("F5-TTS", "cuda")
tts.infer(translated, ref_audio="speaker_reference.wav", output="dubbed.wav")
  1. A 1-minute clip takes 3-8 minutes end-to-end on 12 GB GPU. Quality depends on each stage — weak STT → bad transcript → bad translation → even good TTS can't fix it.

The cheap setup

Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs the full dubbing pipeline: Whisper large-v3 STT at 15-20× real-time, Aya Expanse 8B translation at 40-60 tok/s, F5-TTS cloning at near-real-time. A 5-minute video dubs in ~20-30 minutes. Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe. Total: ~$390-440. For CPU-only: Whisper medium + Kokoro TTS (preset voices only) + NLLB-200-distilled-600M (lighter translator) — dubs a 5-minute video in ~1-2 hours. Functional but slow.

The serious setup

Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs the full pipeline: Whisper large-v3 STT at 20-30× real-time, Aya Expanse 32B translation at 25-40 tok/s (dramatically better translation quality than 8B), F5-TTS cloning. A 30-minute video dubs in ~1-2 hours. For production dubbing (entire TV series, 100+ hours), batch pipeline overnight. Total: ~$1,800-2,200. Dubbing quality is a chain: STT accuracy × translation quality × voice cloning fidelity. Invest in the weakest link — often translation for complex content.

Common beginner mistake

The mistake: Running Whisper tiny.en on accented English speech, getting 60% accuracy, translating the garbled transcript, then blaming the TTS for "weird robotic dubbing." Why it fails: The pipeline is only as strong as its weakest link. Whisper tiny.en has ~70% word error rate on accented or noisy speech. The translation model gets garbage input → produces approximate output. The TTS faithfully reads the wrong words. You blame the TTS, but STT is the culprit. The fix: Always use Whisper large-v3 for dubbing source transcription. The 3 GB model is worth it — 95%+ accuracy on clean speech, 85-90% on accented/noisy speech. Check the transcript BEFORE translating. If the transcript has errors, fix them manually or re-transcribe. A 5-minute manual transcript review saves 2 hours of re-dubbing downstream garbage.

Recommended setup for dubbing & translation

Recommended hardware
Best GPU for local AI →
Audio models are compute-light; most 8-16 GB cards work.
Recommended runtimes

Browse all tools for runtimes that fit this workload.

Budget build
AI PC under $1,000 →
Best GPU for this task
Best GPU for local AI →

Reality check

Audio models are surprisingly forgiving on hardware. Whisper, Coqui, OpenAI Whisper-cpp all run well on 8-12 GB GPUs. The bottleneck is rarely the GPU; it's audio preprocessing and disk I/O for batch transcription.

Common mistakes

  • Overspending on GPU for audio-only workflows (8-12 GB is enough for Whisper)
  • Running audio + LLM concurrently without budgeting VRAM
  • Using fp32 weights when fp16 / int8 give 2-3x speedup with no quality loss
  • Forgetting audio preprocessing eats CPU cycles — a fast SSD helps more than expected

What breaks first

The errors most operators hit when running dubbing & translation locally. Each links to a diagnose+fix walkthrough.

  • CUDA out of memory →
  • Model keeps crashing →
  • HuggingFace download failed →

Before you buy

Verify your specific hardware can handle dubbing & translation before committing money.

  • Will it run on my hardware? →
  • Custom compatibility check →
  • GPU recommender (4 questions) →

Related tasks

Speech-to-Text (STT)Voice Cloning
Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
  • Will it run on my hardware? →
Compare hardware
  • Curated head-to-heads →
  • Custom comparison tool →
  • RTX 4090 vs RTX 5090 →
  • RTX 3090 vs RTX 4090 →
Troubleshooting
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →
Specialized buyer guides
  • GPU for ComfyUI (image-gen) →
  • GPU for KoboldCpp (RP/long-context) →
  • GPU for AI agents →
  • GPU for local OCR →
  • GPU for voice cloning →
  • Upgrade from RTX 3060 →
  • Beginner setup →
  • AI PC for students →
Updated 2026 roundup
  • Best free local AI tools (2026) →