Motion Transfer

Applying motion from a source video to a target subject — pose-driven dance generation, lip-sync, gesture transfer.

Setup walkthrough

Install ComfyUI via Stability Matrix.
ComfyUI Manager → Install Models → search "animate-diff" or "mimicmotion" for pose-driven animation.
For pose-driven motion transfer (dance, gestures):

Approach 1 — AnimateDiff + ControlNet:

Load a reference image (the character you want to animate)
Load a driving video (the motion source — could be a dance video)
Extract pose from driving video with DWPose (ComfyUI node)
Feed pose sequence to AnimateDiff + ControlNet → character moves with the driving pose
16 frames (~1 second) in 30-60 seconds on 12 GB GPU

Approach 2 — MimicMotion (more recent, better quality):

pip install mimicmotion → feed reference image + driving video → outputs animated character
Better temporal consistency than AnimateDiff

First motion-transferred clip in 1-5 minutes per second of output.

The cheap setup

Used RTX 3060 12 GB ($200-250, see /hardware/rtx-3060-12gb). Runs AnimateDiff + ControlNet (pose) at 30-60 seconds per 16-frame clip (1 second of animation at 16 fps). For a 3-second dance clip: ~2-3 minutes. Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe. Total: ~$390-440. Motion transfer is lighter than full video generation — AnimateDiff is an SD 1.5-based method, only needs 4-6 GB VRAM. At $400, you can transfer motion to characters reliably at near-real-time for short clips.

The serious setup

Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs AnimateDiff + multiple ControlNets (pose + depth) at 20-40 seconds per 16-frame clip. For longer sequences (5-10 seconds), the extra VRAM prevents the temporal artifacts that occur when AnimateDiff runs out of context memory. For SDXL-based animation (AnimateDiff-XL): 24 GB handles 32-frame sequences smoothly. Total: ~$1,800-2,200. Motion transfer is significantly lighter than text-to-video generation — even 8 GB cards handle basic workflows.

Common beginner mistake

The mistake: Using a driving video of a professional dancer doing complex full-body spins and jumps, expecting a static portrait photo to animate with the same motion. Why it fails: The motion model sees a source video with extreme pose changes (arms behind back, crouching, jumping) and a reference image in a neutral standing pose. The model can't map extreme poses to an image that lacks the corresponding body parts visible. The legs disappear behind the body — the model hallucinates limbs. The fix: Match the driving video to the reference image. If your reference is a standing portrait, use a driving video of someone nodding, talking, or making small gestures. If you need complex dance motion, the reference image should show the full body in a neutral dance pose. The model maps pose-to-pose — if the driving pose has limbs in positions not visible in the reference, you get artifacts. Garbage in, garbage out applies doubly to motion transfer.

Recommended setup for motion transfer

Recommended hardware

Best GPU for Stable Diffusion + image gen →

Compute-bound workload — VRAM + FP16 TFLOPS both matter.

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Budget build

AI PC under $1,000 →

Best GPU for this task

Best GPU for Stable Diffusion + image gen →

Reality check

Local video gen is genuinely possible in 2026 (LTX-Video, Mochi) but VRAM-hungry. 24 GB is the working minimum; 32 GB is the comfort zone for long-form workflows. Below 24 GB, video gen isn't realistic with current models.

Common mistakes

Trying video gen on 16 GB cards (model + KV cache doesn't fit)
Underestimating runtime VRAM (peak draw 1.5x model size on long sequences)
Mixing video gen with concurrent LLM serving on same GPU
Using Mac Silicon for video gen — viable but 30-50% slower than CUDA

What breaks first

The errors most operators hit when running motion transfer locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle motion transfer before committing money.

Buyer guides

Compare hardware

Troubleshooting

Specialized buyer guides

Updated 2026 roundup

Best free local AI tools (2026) →