Sequential image generation for film storyboards, comic panels. Consistency across frames is the hard problem.
Simple: Use a fixed seed + fixed prompt structure. Prompt: "[SHOT 1] Wide shot of detective in office, raining outside, moody lighting" → generate. Change prompt to "[SHOT 2] Close-up of detective's hand on evidence" → same seed.
Advanced: Use IP-Adapter (install via ComfyUI Manager) with a reference image of your character. Each frame: same IP-Adapter → consistent face/clothing.
Used RTX 3060 12 GB ($200-250, see /hardware/rtx-3060-12gb). Runs SDXL for storyboard frames at 8-15 seconds each — a 12-panel storyboard in ~2-3 minutes. IP-Adapter for character consistency adds minimal overhead (uses <1 GB VRAM). Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe. Total: ~$390-440. For simple storyboards (stick figure level, layout exploration), even a GTX 1060 6 GB ($60) with SD 1.5 does 3-6 seconds per frame — useful for blocking and composition planning.
Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Flux Dev for high-quality storyboard frames at 10-20 seconds each. Can load multiple LoRAs (character + environment + style) simultaneously for frame-to-frame consistency. For film production storyboarding (50-100 frames/day): the workflow is viable. Total: ~$1,800-2,200. For the absolute fastest iteration: RTX 4090 ($2,000) at 5-8 seconds per Flux frame. Storyboard generation is a throughput problem — more frames = more ideas explored.
The mistake: Generating each storyboard frame with a new random seed, getting a different "actor" and lighting style in every panel. Why it fails: Random seeds are random — each frame is a completely independent generation with different facial features, clothing, and lighting. The storyboard looks like scenes from 6 different movies glued together. The fix: Lock the seed across frames. Use {$seed:12345} in ComfyUI for all panels. The same seed + same model produces consistent style. Then vary the prompt for content while the seed maintains visual coherence. For character consistency: use IP-Adapter with a reference image. For environment consistency: use ControlNet depth/canny with a 3D blockout render as input for each frame.
Browse all tools for runtimes that fit this workload.
Image gen is compute-bound, not bandwidth-bound. VRAM matters for the resolution + LoRA training stack, but FP16 TFLOPS is what decides Flux throughput. The 5080's compute advantage over 5070 Ti shows here in ways it doesn't on LLM inference.
The errors most operators hit when running storyboard generation locally. Each links to a diagnose+fix walkthrough.
Verify your specific hardware can handle storyboard generation before committing money.