Copy. Paste. Run.
Four Docker Compose bundles for the four things most people actually want to do with local AI. Real commands, real prerequisites, no hand-waving. Each links to its matching /stacks page when you want the why behind the what.
Tested 2026-05-13 on Ollama 0.5.x · Docker 24.0 · NVIDIA driver 550+. Not sure if your hardware fits? Run /will-it-run first.
Chat — Ollama + Open WebUI
First-install stack. Local chat UI in your browser, model weights cached on disk, zero data leaves your machine.
- Docker 24+ with Compose v2
- NVIDIA Container Toolkit (Linux) or Docker Desktop with GPU
- 20GB free disk for model weights
# docker-compose.yml — chat bundle
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama:/root/.ollama
ports:
- "11434:11434"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: openwebui
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- openwebui:/app/backend/data
ports:
- "3000:8080"
restart: unless-stopped
volumes:
ollama:
openwebui:- 01
docker compose up -d - 02
docker exec ollama ollama pull llama3.2:3b - 03
open http://localhost:3000
RAG on your documents — AnythingLLM
Drop a folder of PDFs / Markdown, ask questions, get answers grounded in your files. No cloud, no upload, no leakage.
- Docker 24+ with Compose v2
- Append this services block to your chat bundle's docker-compose.yml (same file — anythingllm needs the same compose network to reach 'ollama' by name)
- 40GB free disk (chunks + embeddings can balloon)
# Append to the chat bundle's docker-compose.yml so 'anythingllm'
# shares the chat bundle's default network and can reach 'ollama'.
services:
anythingllm:
image: mintplexlabs/anythingllm:latest
container_name: anythingllm
depends_on:
- ollama
environment:
- STORAGE_DIR=/app/server/storage
- LLM_PROVIDER=ollama
- OLLAMA_BASE_PATH=http://ollama:11434
- OLLAMA_MODEL_PREF=llama3.2:3b
- EMBEDDING_ENGINE=ollama
- EMBEDDING_BASE_PATH=http://ollama:11434
- EMBEDDING_MODEL_PREF=nomic-embed-text
- VECTOR_DB=lancedb
volumes:
- anythingllm-storage:/app/server/storage
- ./documents:/documents:ro
ports:
- "3001:3001"
restart: unless-stopped
volumes:
anythingllm-storage:- 01
docker exec ollama ollama pull nomic-embed-text - 02
mkdir -p ./documents && cp ~/your-pdfs/*.pdf ./documents/ - 03
docker compose up -d anythingllm - 04
open http://localhost:3001 and point a workspace at /documents
Coding agent — Aider + Ollama
Terminal-driven coding loop. Aider reads your repo, plans diffs, and applies them via git. Pairs well with deepseek-coder or qwen2.5-coder.
- Python 3.10+ on host (Aider runs natively, not in Docker)
- Ollama already running (reuse the chat bundle)
- A git repo to point Aider at
# Aider runs on the host, not in a container — but Ollama does.
# Reuse the chat bundle's ollama service, then install Aider:
python3 -m pip install aider-chat
# Pull a coder model that fits your VRAM:
docker exec ollama ollama pull deepseek-coder-v2:16b # 16GB rig
# OR
docker exec ollama ollama pull qwen2.5-coder:32b # 24GB rig
# Tell Aider where Ollama lives (required):
export OLLAMA_API_BASE=http://127.0.0.1:11434
# Launch Aider against your repo:
cd ~/your-project
aider --model ollama_chat/deepseek-coder-v2:16b \
--no-auto-commit \
--map-tokens 1024- 01
Open a feature branch first — aider rewrites files in place - 02
Start small: 'fix the bug in src/parser.ts line 42' - 03
Use /diff in aider to review before each apply
Vision — Open WebUI + LLaVA
Same Open WebUI you used for chat, now with image input. Drag a PNG into the chat, ask 'what is this'. Works for screenshots, diagrams, photos.
- Chat bundle already running (Ollama + Open WebUI)
- 8GB additional disk for the vision model weights
# Vision uses the chat bundle's docker-compose.yml unchanged. # Just pull a vision-capable model into Ollama: docker exec ollama ollama pull llava:7b # 12GB VRAM # OR docker exec ollama ollama pull llava:13b # 16GB VRAM # OR (newer, generally better quality) docker exec ollama ollama pull llama3.2-vision:11b # 16GB VRAM # In Open WebUI: select the model from the top-left dropdown, # then drag any image into the chat input.
- 01
Test with a known image first (a screenshot of this page works) - 02
Quality varies sharply by model — try 3 and pick the one that works for your use case - 03
OCR tasks: try llama3.2-vision over the older llava
Agentic AI — Hermes + AutoGen / CrewAI
Multi-agent workflows running entirely on your hardware. Hermes 3 8B handles tool-use reliably; pair with AutoGen or CrewAI for the orchestration layer — or for a personal-AI agent with a WhatsApp / Telegram / Slack UI, try OpenClaw (the 2026 local-first breakout). Every option backs onto Ollama via its OpenAI-compatible API.
- Chat bundle already running (Ollama + Open WebUI from bundle 01)
- Python 3.10+ on host (the orchestration framework runs natively)
- Hermes 3 / 4 pulled into Ollama: docker exec ollama ollama pull hermes3:8b
# Hermes is the tool-use sweet spot for local agents.
docker exec ollama ollama pull hermes3:8b # 12GB rig
# OR
docker exec ollama ollama pull hermes3:70b # 48GB+ rig
# Pick ONE orchestration framework:
# Option A — CrewAI (role-based crews, simpler mental model)
python3 -m pip install crewai crewai-tools
export OPENAI_API_BASE=http://127.0.0.1:11434/v1
export OPENAI_API_KEY=ollama
# Then in your Python:
# from crewai import Agent, Task, Crew
# from crewai.llm import LLM
# llm = LLM(model="ollama/hermes3:8b", base_url="http://127.0.0.1:11434")
# Option B — AutoGen (free-form multi-agent conversation)
python3 -m pip install pyautogen
# Then in your Python:
# from autogen import AssistantAgent
# config = {"config_list": [{"model": "hermes3:8b",
# "base_url": "http://127.0.0.1:11434/v1", "api_key": "ollama"}]}
# Option C — LangGraph (deterministic graph flows + checkpointing)
python3 -m pip install langgraph langchain-ollama- 01
Start with a 2-agent crew before scaling — debug cost compounds fast - 02
Hermes 3 8B handles tool-use; if the agent loops or hallucinates tools, try Hermes 4 70B - 03
Set max_turns / max_round limits — uncapped multi-agent loops are the #1 token sink - 04
Sandbox any code-execution tools (Docker, restricted Python) before letting an agent run them - 05
For a personal-AI agent with messaging-platform UX (not a framework), see /tools/openclaw — install via openclaw.ai then point it at the Ollama endpoint