Workflows

Full local-AI systems, not parts. Each workflow is the operational truth of how a complete deployment runs end-to-end — services, models, runtimes, vector DB, observability, security, upgrade path, what breaks first.

/stacks answers “what should I assemble?” /workflows answers “what does it look like to operate this end-to-end?”

Local coding-agent system

Homelab

End-to-end local autonomous coding agent. vLLM serving Qwen 2.5 Coder 32B, OpenHands as the agent controller, Open WebUI for chat, Qdrant + nomic-embed for code RAG, bge-reranker for retrieval, Redis for the agent queue,…

Difficulty: Week·Services: 11·Concurrency: 1-2 concurrent users (one huma

Offline RAG pipeline

Edge / air-gapped

Air-gapped retrieval-augmented chat. Document ingestion via unstructured.io, nomic-embed embeddings into Qdrant, bge-reranker rerank, Qwen 2.5 14B-Instruct generation, Open WebUI as the chat surface, observability + nigh…

Difficulty: Week·Services: 10·Concurrency: 5-15 concurrent users (single

Local voice assistant pipeline

Voice

Real-time speech-to-LLM-to-speech. faster-whisper for transcription, Llama 3.1 8B / Qwen 2.5 7B for the brain, Piper TTS for synthesis. Runs on a single 4090 or even an RTX 3060 with the smaller model. Adds VAD, wake-wor…

Difficulty: Week·Services: 6·Concurrency: 1-3 concurrent rooms (one wake

Private ChatGPT replacement

Homelab

The full ChatGPT-style experience without OpenAI. Open WebUI as the chat surface, Ollama serving Llama 3.1 / Qwen 2.5, optional persistent memory, optional code-interpreter sandbox, optional document chat. Sized for solo…

Difficulty: Weekend·Services: 7·Concurrency: 1-5 concurrent users

Homelab AI API gateway

Homelab

Self-hosted OpenAI-compatible API for everything you build. vLLM + LiteLLM + Caddy + per-app API keys + Prometheus metrics. The drop-in replacement for the OpenAI Python client across your homelab projects.

Difficulty: Weekend·Services: 7·Concurrency: 5-20 concurrent personal proje

Multi-user local AI server

Production

Production-tier self-hosted AI for 20-100 users. SGLang or vLLM with replicas, LiteLLM gateway, Postgres-backed Open WebUI, SSO, observability, audit logging, backup. The internal-tools-team setup.

Difficulty: Month·Services: 7·Concurrency: 20-100 concurrent users

Local evaluation lab

Research

Run reproducible benchmarks on local models. lm-evaluation-harness + bigcode-eval-harness + custom task runners + Postgres results store + Grafana for tracking. The setup that turns 'this model feels smarter' into 'this …

Difficulty: Week·Services: 6·Concurrency: 1 active eval run; multiple qu

Local fine-tuning workstation

Research

QLoRA / LoRA fine-tuning on a single workstation. Axolotl / Unsloth + bitsandbytes + DeepSpeed (optional) + dataset prep + WandB (or self-hosted MLflow). Targets 7B-13B fine-tunes on 24 GB VRAM; pushes 32B with multi-GPU…

Difficulty: Week·Services: 6·Concurrency: 1 active training run; queued

Private job-search assistant

Homelab

A homelab assistant that helps you tailor resumes, draft cover letters, track applications, and rehearse interview answers — without your career data ever touching a cloud LLM. LM Studio + Llama 3.1 8B for chat, Anything…

Difficulty: Weekend·Services: 8·Concurrency: 1 user (you)

What a workflow includes

Every workflow page documents the full operational surface, not just the happy-path component list.

  • Service ledger with one-line operator note per service
  • Hardware footprint, concurrency expectation, power note
  • Storage strategy (hot data + backup + retention)
  • Networking (LAN binding, public exposure, mesh / proxy)
  • Observability (metrics that matter, alerts to set)
  • Security (auth, sandboxing, audit, key management)
  • Upgrade path (when to evolve which piece)
  • What breaks first (the operator's day-30 reality)

Why “workflow” not “tutorial”

A tutorial gets you to a happy-path running state and stops. Three weeks later your stack has accumulated entropy: a Docker image bumped, a driver auto-updated, the Caddy cert is about to expire, the SSD where your embeddings live is at 87% full, and you don't remember which port the LLM is listening on. The tutorial isn't wrong; it's just incomplete.

A workflow page describes the full operational reality of a deployment over time. Day 1 (set it up) gets equal weight as day 30 (what broke), day 90 (what you upgraded), and day 365 (when to retire it). The point is to surface the operational surface that tutorials skip — the boring, durable knowledge that determines whether a homelab survives or dies.

When to read a workflow vs a stack vs a path

A path teaches you the order to learn things in. A stack tells you which specific picks compose into a working system. A workflow shows you what operating that system actually looks like end-to-end. The three layers compose: pick a path to orient, pick a stack to assemble, read the workflow to understand what you're committing to operationally.

For most operators, the right reading order is path → workflow → stack. The path tells you which workflow matches your situation; the workflow tells you what operating that shape actually involves; the stack gives you the assembly recipe with concrete versions and commands.