Filling in masked regions of an image based on context + optional prompts. Essential for object removal, background replacement, content-aware fills.
Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs SD 2.0 inpainting at 8-15 seconds per 1024×1024 edit. Flux Fill at FP8 (GGUF quant, ~12 GB) at 20-35 seconds per edit. For simple object removal from photos, SD inpainting on 12 GB is perfectly adequate. Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe. Total: ~$390-440. For light inpainting (blemish removal, small object deletion), even 8 GB cards work with SD 1.5 inpainting at 5-10 seconds.
Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Flux Fill Dev FP16 at 8-15 seconds per 1024×1024 edit — the highest-quality local inpainting available. Handles complex fills (replacing entire backgrounds, removing large objects with detailed replacement). For production photo editing workflows (50-100 images/day): the speed is acceptable for interactive use. Total: ~$1,800-2,200. RTX 4090 24 GB ($1,600, see /hardware/rtx-4090) drops Flux Fill to 3-6 seconds — fast enough for real-time preview.
The mistake: Masking the object with a sharp-edged rectangle and wondering why the filled region has visible seams. Why it fails: Inpainting models blend at mask edges — a sharp rectangular mask creates a visible boundary where the new content abruptly meets the old. The model fills within the mask, but the transition is jarring. The fix: Use soft-edged masks. In ComfyUI's MaskEditor, use a soft brush with 20-40% hardness. Feather the mask edges by 10-30 pixels. This gives the model a transition zone to blend the new content with the original. Also: include some surrounding context in the mask (extend 20-50 pixels beyond the object) so the model has reference pixels to match lighting and texture. Soft masks + context overlap = seamless inpainting.
Browse all tools for runtimes that fit this workload.
Image gen is compute-bound, not bandwidth-bound. VRAM matters for the resolution + LoRA training stack, but FP16 TFLOPS is what decides Flux throughput. The 5080's compute advantage over 5070 Ti shows here in ways it doesn't on LLM inference.
The errors most operators hit when running inpainting locally. Each links to a diagnose+fix walkthrough.
Verify your specific hardware can handle inpainting before committing money.