Avatar Generation

Talking-head avatar video generation from audio + reference image. SadTalker, EMO, Hallo, AnimateDiff are open-weight options.

Setup walkthrough

Install ComfyUI via Stability Matrix.
ComfyUI Manager → Install Models → "sadtalker" (2 GB — face animation from audio) or "hallo" (5 GB — newer, better quality).
For SadTalker: ComfyUI workflow takes (a) a portrait photo, (b) an audio file (WAV, 3-10 seconds of speech).
- The model generates a talking-head video: face moves with the audio, lips sync, natural head motion.
- Resolution: typically 256×256 (face crop). Upscale to 512×512 with GFPGAN/CodeFormer (install via ComfyUI Manager).
First talking-head video in 30-90 seconds on 8+ GB GPU for a 5-second clip.
For Hallo (better quality, more natural motion):
- git clone https://github.com/fudan-generative-vision/hallo → follow setup instructions
- Produces 512×512 talking heads with natural eye blinking, head movement, and expression variation
- 1-3 minutes per 5-second clip on 12+ GB GPU
Use cases: AI presenters, virtual assistants, character dialogue, video messages without filming.

The cheap setup

Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs SadTalker at 30-60 seconds per 5-second clip. Hallo at 1-3 minutes per clip. For a 1-minute avatar video: ~10-30 minutes of generation. Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe. Total: ~$390-440. Avatar generation is moderate compute — faster than text-to-video by 10-50×. For simple talking heads (SadTalker), 8 GB cards handle it comfortably. Hallo benefits from 12+ GB for higher resolution outputs.

The serious setup

Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs Hallo at 30-60 seconds per 5-second clip — near-real-time avatar generation. Can produce 10 minutes of avatar video in ~1-2 hours. For production avatar pipelines (AI news anchors, virtual customer service agents), batch generation overnight handles daily content needs. Total: ~$1,800-2,200. Avatar generation is not the bottleneck — audio recording and script writing take more time than GPU rendering. A single RTX 3060 handles most production avatar workloads.

Common beginner mistake

The mistake: Using a low-resolution webcam selfie as the reference photo, then wondering why the avatar looks pixelated and unnatural. Why it fails: SadTalker/Hallo work at fixed resolutions (256×256 or 512×512). A compressed, noisy webcam image at 480p has facial details already lost. When the model animates it, every artifact animates too — JPEG compression blocks dance across the face. The fix: Use a high-quality portrait photo: well-lit (soft diffuse light, no harsh shadows), neutral expression, looking at camera, 1024×1024+ resolution, sharp focus on eyes. The reference photo quality IS the avatar quality. For professional avatars, take the photo with a decent camera (phone in portrait mode, mirrorless camera with 85mm lens) in controlled lighting. Garbage portrait → garbage avatar. Good portrait → convincing avatar.

Recommended setup for avatar generation

Recommended hardware

Best GPU for Stable Diffusion + image gen →

Compute-bound workload — VRAM + FP16 TFLOPS both matter.

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Budget build

AI PC under $1,000 →

Best GPU for this task

Best GPU for Stable Diffusion + image gen →

Reality check

Local video gen is genuinely possible in 2026 (LTX-Video, Mochi) but VRAM-hungry. 24 GB is the working minimum; 32 GB is the comfort zone for long-form workflows. Below 24 GB, video gen isn't realistic with current models.

Common mistakes

Trying video gen on 16 GB cards (model + KV cache doesn't fit)
Underestimating runtime VRAM (peak draw 1.5x model size on long sequences)
Mixing video gen with concurrent LLM serving on same GPU
Using Mac Silicon for video gen — viable but 30-50% slower than CUDA

What breaks first

The errors most operators hit when running avatar generation locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle avatar generation before committing money.

Buyer guides

Compare hardware

Troubleshooting

Specialized buyer guides

Updated 2026 roundup

Best free local AI tools (2026) →