BLK · QUICKSTART

Copy. Paste. Run.

Four Docker Compose bundles for the four things most people actually want to do with local AI. Real commands, real prerequisites, no hand-waving. Each links to its matching /stacks page when you want the why behind the what.

Tested 2026-05-13 on Ollama 0.5.x · Docker 24.0 · NVIDIA driver 550+. Not sure if your hardware fits? Run /will-it-run first.

BUNDLE 01

Chat — Ollama + Open WebUI

First-install stack. Local chat UI in your browser, model weights cached on disk, zero data leaves your machine.

VRAM minimum
8GB (7B Q4) · 16GB recommended (14B Q4)
Prerequisites
  • Docker 24+ with Compose v2
  • NVIDIA Container Toolkit (Linux) or Docker Desktop with GPU
  • 20GB free disk for model weights
Compose / setup
# docker-compose.yml — chat bundle
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped

  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: openwebui
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - openwebui:/app/backend/data
    ports:
      - "3000:8080"
    restart: unless-stopped

volumes:
  ollama:
  openwebui:
Run it
  1. 01docker compose up -d
  2. 02docker exec ollama ollama pull llama3.2:3b
  3. 03open http://localhost:3000
BUNDLE 02

RAG on your documents — AnythingLLM

Drop a folder of PDFs / Markdown, ask questions, get answers grounded in your files. No cloud, no upload, no leakage.

VRAM minimum
12GB (embedding + 7B chat) · 16GB recommended
Prerequisites
  • Docker 24+ with Compose v2
  • Append this services block to your chat bundle's docker-compose.yml (same file — anythingllm needs the same compose network to reach 'ollama' by name)
  • 40GB free disk (chunks + embeddings can balloon)
Compose / setup
# Append to the chat bundle's docker-compose.yml so 'anythingllm'
# shares the chat bundle's default network and can reach 'ollama'.
services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    container_name: anythingllm
    depends_on:
      - ollama
    environment:
      - STORAGE_DIR=/app/server/storage
      - LLM_PROVIDER=ollama
      - OLLAMA_BASE_PATH=http://ollama:11434
      - OLLAMA_MODEL_PREF=llama3.2:3b
      - EMBEDDING_ENGINE=ollama
      - EMBEDDING_BASE_PATH=http://ollama:11434
      - EMBEDDING_MODEL_PREF=nomic-embed-text
      - VECTOR_DB=lancedb
    volumes:
      - anythingllm-storage:/app/server/storage
      - ./documents:/documents:ro
    ports:
      - "3001:3001"
    restart: unless-stopped

volumes:
  anythingllm-storage:
Run it
  1. 01docker exec ollama ollama pull nomic-embed-text
  2. 02mkdir -p ./documents && cp ~/your-pdfs/*.pdf ./documents/
  3. 03docker compose up -d anythingllm
  4. 04open http://localhost:3001 and point a workspace at /documents
BUNDLE 03

Coding agent — Aider + Ollama

Terminal-driven coding loop. Aider reads your repo, plans diffs, and applies them via git. Pairs well with deepseek-coder or qwen2.5-coder.

VRAM minimum
16GB (14B coder Q4) · 24GB recommended (32B coder)
Prerequisites
  • Python 3.10+ on host (Aider runs natively, not in Docker)
  • Ollama already running (reuse the chat bundle)
  • A git repo to point Aider at
Compose / setup
# Aider runs on the host, not in a container — but Ollama does.
# Reuse the chat bundle's ollama service, then install Aider:

python3 -m pip install aider-chat

# Pull a coder model that fits your VRAM:
docker exec ollama ollama pull deepseek-coder-v2:16b   # 16GB rig
# OR
docker exec ollama ollama pull qwen2.5-coder:32b       # 24GB rig

# Tell Aider where Ollama lives (required):
export OLLAMA_API_BASE=http://127.0.0.1:11434

# Launch Aider against your repo:
cd ~/your-project
aider --model ollama_chat/deepseek-coder-v2:16b \
      --no-auto-commit \
      --map-tokens 1024
Run it
  1. 01Open a feature branch first — aider rewrites files in place
  2. 02Start small: 'fix the bug in src/parser.ts line 42'
  3. 03Use /diff in aider to review before each apply
BUNDLE 04

Vision — Open WebUI + LLaVA

Same Open WebUI you used for chat, now with image input. Drag a PNG into the chat, ask 'what is this'. Works for screenshots, diagrams, photos.

VRAM minimum
12GB (Llava-7B) · 16GB recommended (Llava-13B)
Prerequisites
  • Chat bundle already running (Ollama + Open WebUI)
  • 8GB additional disk for the vision model weights
Compose / setup
# Vision uses the chat bundle's docker-compose.yml unchanged.
# Just pull a vision-capable model into Ollama:

docker exec ollama ollama pull llava:7b              # 12GB VRAM
# OR
docker exec ollama ollama pull llava:13b             # 16GB VRAM
# OR (newer, generally better quality)
docker exec ollama ollama pull llama3.2-vision:11b   # 16GB VRAM

# In Open WebUI: select the model from the top-left dropdown,
# then drag any image into the chat input.
Run it
  1. 01Test with a known image first (a screenshot of this page works)
  2. 02Quality varies sharply by model — try 3 and pick the one that works for your use case
  3. 03OCR tasks: try llama3.2-vision over the older llava
BUNDLE 05

Agentic AI — Hermes + AutoGen / CrewAI

Multi-agent workflows running entirely on your hardware. Hermes 3 8B handles tool-use reliably; pair with AutoGen or CrewAI for the orchestration layer — or for a personal-AI agent with a WhatsApp / Telegram / Slack UI, try OpenClaw (the 2026 local-first breakout). Every option backs onto Ollama via its OpenAI-compatible API.

VRAM minimum
12GB (Hermes 3 8B Q4) · 24GB recommended (Hermes 4 70B for harder workflows)
Prerequisites
  • Chat bundle already running (Ollama + Open WebUI from bundle 01)
  • Python 3.10+ on host (the orchestration framework runs natively)
  • Hermes 3 / 4 pulled into Ollama: docker exec ollama ollama pull hermes3:8b
Compose / setup
# Hermes is the tool-use sweet spot for local agents.
docker exec ollama ollama pull hermes3:8b           # 12GB rig
# OR
docker exec ollama ollama pull hermes3:70b          # 48GB+ rig

# Pick ONE orchestration framework:

# Option A — CrewAI (role-based crews, simpler mental model)
python3 -m pip install crewai crewai-tools
export OPENAI_API_BASE=http://127.0.0.1:11434/v1
export OPENAI_API_KEY=ollama
# Then in your Python:
#   from crewai import Agent, Task, Crew
#   from crewai.llm import LLM
#   llm = LLM(model="ollama/hermes3:8b", base_url="http://127.0.0.1:11434")

# Option B — AutoGen (free-form multi-agent conversation)
python3 -m pip install pyautogen
# Then in your Python:
#   from autogen import AssistantAgent
#   config = {"config_list": [{"model": "hermes3:8b",
#     "base_url": "http://127.0.0.1:11434/v1", "api_key": "ollama"}]}

# Option C — LangGraph (deterministic graph flows + checkpointing)
python3 -m pip install langgraph langchain-ollama
Run it
  1. 01Start with a 2-agent crew before scaling — debug cost compounds fast
  2. 02Hermes 3 8B handles tool-use; if the agent loops or hallucinates tools, try Hermes 4 70B
  3. 03Set max_turns / max_round limits — uncapped multi-agent loops are the #1 token sink
  4. 04Sandbox any code-execution tools (Docker, restricted Python) before letting an agent run them
  5. 05For a personal-AI agent with messaging-platform UX (not a framework), see /tools/openclaw — install via openclaw.ai then point it at the Ollama endpoint