Photorealistic Image Generation
Photorealistic portraits, landscapes, product shots. Flux Dev/Schnell, Stable Diffusion 3.5 Large, Playground v3 are the open-weight leaders.
Setup walkthrough
- Install ComfyUI via Stability Matrix.
- ComfyUI Manager → Install Models → search "flux1-dev" → download (~23 GB).
- Load the default Flux Dev workflow. Resolution: 1024×1024 (or 1024×1536 for portraits).
- Settings: steps=20, guidance=3.5, sampler=Euler.
- Prompt: "Professional product photography of a ceramic coffee mug on a wooden table, morning sunlight through window, shallow depth of field, 85mm lens, photorealistic, 8K detail."
- First image in 10-15 seconds on RTX 3090 24 GB, 5-8 seconds on RTX 4090 24 GB.
- For faster iteration: use Flux Schnell (steps=4, 2-5 seconds on 24 GB) for quick previews, then switch to Flux Dev for final render.
- For SDXL-quality photorealism (lighter): use RealVisXL V5.0 or Juggernaut XL (~7 GB each, 8-15 seconds on 12 GB).
The cheap setup
Used RTX 3060 12 GB ($200-250, see /hardware/rtx-3060-12gb). Runs SDXL + RealVisXL at 1024×1024 in 8-15 seconds — very good photorealism for product shots, portraits, landscapes. Flux Schnell at FP8 (12 GB via GGUF) at 15-25 seconds — better text rendering and composition. Flux Dev requires FP8 quant to fit 12 GB — doable but slow (25-40 seconds). Pair with Ryzen 5 5600 + 32 GB DDR4 + 1TB NVMe. Total: ~$390-440. For photorealistic portraits, SDXL + RealVisXL on 12 GB gets you 85% of Flux quality at 2× speed.
The serious setup
Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Flux Dev at 10-15 seconds per 1024×1024 — gold standard for local photorealism. Flux Schnell at 2-4 seconds for rapid iteration. Can also run Stable Diffusion 3.5 Large (14 GB) for alternate photorealism style. Pair with Ryzen 7 7700X + 64 GB DDR5 + 2TB NVMe. Total: $1,800-2,200. RTX 4090 24 GB ($1,600, see /hardware/rtx-4090) drops Flux Dev to 5-8 seconds and Flux Schnell to 1-2 seconds — the current single-GPU photorealism king.
Common beginner mistake
The mistake: Prompting a photorealistic model with vague, artistic descriptions ("beautiful forest, magical atmosphere, dreamy lighting") and expecting a photograph. Why it fails: The model interprets "beautiful, magical, dreamy" as artistic/illustration cues. It produces an illustrated forest with photo textures — the uncanny valley of "looks almost real but not quite." The fix: Use photography-specific language: specify camera ("shot on Sony A7III"), lens ("85mm f/1.4"), lighting ("golden hour, natural window light"), technical terms ("shallow depth of field, bokeh, 1/500 shutter speed"), and post-processing references ("edited in Lightroom"). Photography prompts produce photographs. Art prompts produce art. The model is a camera — speak its language.
Recommended setup for photorealistic image generation
Browse all tools for runtimes that fit this workload.
Reality check
Image gen is compute-bound, not bandwidth-bound. VRAM matters for the resolution + LoRA training stack, but FP16 TFLOPS is what decides Flux throughput. The 5080's compute advantage over 5070 Ti shows here in ways it doesn't on LLM inference.
Common mistakes
- Buying for VRAM ceiling without checking compute (16 GB Flux Dev FP16 doesn't fit anyway)
- Skipping LoRA training requirements (24 GB minimum, 32 GB comfortable for Flux)
- Underestimating ComfyUI's multi-model VRAM appetite vs A1111's single-pipeline
- Using Q4 quantized image models — quality drop is more visible than on LLMs
What breaks first
The errors most operators hit when running photorealistic image generation locally. Each links to a diagnose+fix walkthrough.
Before you buy
Verify your specific hardware can handle photorealistic image generation before committing money.