Stack · L3 execution·Workstation tier

Build a fully offline coding stack (May 2026)

An autonomous coding agent that runs entirely on a workstation with no outbound network egress. Pre-staged models, audited dependency chain, network-monitored verification — the stack that holds up to real air-gap audits.

By Fredoline Eruo·Last reviewed 2026-05-06·~12 min read
The stack
  1. 01
    ToolCoding agent (offline-verified)
    openhands

    OpenHands has the cleanest offline path of the autonomous-agent leaders. Docker container + filesystem MCP + provider-abstracted memory all work without internet once dependencies are pre-staged. OpenClaw works too but its faster release cadence makes the dependency-pinning audit harder.

  2. 02
    ToolInference engine
    vllm

    vLLM with pre-pulled Docker image + pre-staged HuggingFace cache runs entirely offline. Continuous batching matters because the agent makes 5-15 tool calls per task. The OpenAI-compatible API plugs into OpenHands with no adapter.

  3. 03
    ModelCoding model
    qwen-2.5-coder-32b-instruct

    Qwen 2.5 Coder 32B AWQ-INT4 fits 24GB with 32K context — strongest open coding model in the 32B class as of May 2026. Apache 2.0 license: usable in any environment without licensing surprises. Pre-stage the AWQ weights locally before egress lockdown.

  4. 04
    ToolMCP filesystem (strict allowlist)
    mcp-server-filesystem

    Reference Anthropic filesystem MCP. Strict directory allowlisting limits the agent's blast radius — non-optional for offline deployments where the network can't catch a destructive mistake.

  5. 05
    ToolMCP git (read-side only)
    mcp-server-git

    Read-side git operations give the agent commit history awareness. Combined with filesystem MCP, full repo grounding without network access.

  6. 06
    ToolMemory (local-only via LanceDB)
    mem0

    Mem0 with LanceDB backend — no hosted memory service in the loop. All consolidation runs on the local LLM (vLLM endpoint); no third-party API calls. Cross-session memory works fully offline.

  7. 07
    HardwareGPU
    rtx-4090

    RTX 4090 24GB is the workstation default. Same hardware constraint as /stacks/local-coding-agent; the offline pivot is software + network, not GPU choice.

Why fully-offline matters

The general /stacks/local-coding-agent recipe is private — all data stays on your machine — but it isn't fully offline. Docker pulls images on first run; HuggingFace caches download models; npm-install pulls MCP server packages. Once you've done that initial setup, the stack runs locally — but the audit trail of “where did these dependencies come from” matters in regulated environments.

Fully offline means a different threat model: no outbound network calls during operation, with verifiable audit trail of every dependency. The pivot from local-coding-agent to fully-offline is in the dependency staging + network-egress verification, not the model + agent choice.

Industries where this matters: regulated finance (SOX compliance), healthcare (HIPAA-protected codebases), defense / aerospace (classified networks), legal (privileged discovery). Less common but real: export-controlled-research environments, internal company policy, contractual data-residency requirements.

Pre-staging workflow (CRITICAL)

The single most important step. Pre-stage all dependencies on a network-connected machine, then transfer to the air-gapped target. Never download dependencies on the air-gapped machine, even “just once.”

# On a network-connected staging machine, pull everything you need:

# 1. Pull the vLLM Docker image
docker pull vllm/vllm-openai:v0.17.1
docker save vllm/vllm-openai:v0.17.1 | gzip > vllm-image.tar.gz

# 2. Pull the model weights
hf download Qwen/Qwen2.5-Coder-32B-Instruct-AWQ \
  --local-dir ./qwen-coder-32b

# 3. Pull MCP server packages (npm offline cache)
mkdir mcp-cache
cd mcp-cache
npm pack @modelcontextprotocol/server-filesystem
npm pack @modelcontextprotocol/server-git
cd ..

# 4. Pull OpenHands Docker image
docker pull ghcr.io/all-hands-ai/openhands:latest
docker save ghcr.io/all-hands-ai/openhands:latest | gzip > openhands-image.tar.gz

# 5. Pull Open WebUI (frontend)
docker pull ghcr.io/open-webui/open-webui:latest
docker save ghcr.io/open-webui/open-webui:latest | gzip > openwebui-image.tar.gz

# 6. Generate dependency manifest with checksums
sha256sum *.tar.gz qwen-coder-32b/*.safetensors mcp-cache/*.tgz \
  > dependency-manifest.txt

Transfer the resulting bundle (Docker images + model weights + MCP packages + manifest) to the air-gapped machine via USB / one-way file transfer / approved data-diode. Verify checksums on arrival.

Step-by-step setup on the air-gapped machine

1. Block egress before anything else

# Set the iptables firewall to block all outbound traffic except
# loopback and the specific local subnets you allow:
sudo iptables -P OUTPUT DROP
sudo iptables -A OUTPUT -o lo -j ACCEPT
sudo iptables -A OUTPUT -d 192.168.0.0/16 -j ACCEPT  # local LAN
sudo iptables -A OUTPUT -d 10.0.0.0/8 -j ACCEPT      # private RFC1918

# Verify with packet capture during a normal task — see Network
# egress verification section below.
sudo iptables -L OUTPUT -v -n

2. Load Docker images from the staging bundle

# Load images locally from the pre-staged tarballs
gunzip -c vllm-image.tar.gz | docker load
gunzip -c openhands-image.tar.gz | docker load
gunzip -c openwebui-image.tar.gz | docker load

# Verify they loaded
docker images | grep -E "vllm|openhands|openwebui"

3. Bring up vLLM with pre-staged model

# Mount the pre-staged model directory; vLLM uses it directly
docker run --gpus all -d --name vllm \
  -p 127.0.0.1:8000:8000 \
  -v /opt/models/qwen-coder-32b:/model \
  --restart unless-stopped \
  vllm/vllm-openai:v0.17.1 \
  --model /model \
  --gpu-memory-utilization 0.85 \
  --max-model-len 32768 \
  --enable-chunked-prefill

# Bind to 127.0.0.1 ONLY — never expose to the LAN unless you've
# audited what hits the endpoint.

4. Wire OpenHands with offline MCP servers

# Install MCP servers from pre-staged npm cache (NOT npm registry):
mkdir -p ~/.npm-offline
cp mcp-cache/*.tgz ~/.npm-offline/
npm install --offline -g \
  ~/.npm-offline/modelcontextprotocol-server-filesystem-*.tgz \
  ~/.npm-offline/modelcontextprotocol-server-git-*.tgz

# OpenHands config — same as /stacks/local-coding-agent but verify
# every URL is local
[llm]
model = "openai//model"
api_base = "http://localhost:8000/v1"
api_key = "anything"

[mcp]
servers = [
  { command = "mcp-server-filesystem", args = ["/home/you/projects/active"] },
  { command = "mcp-server-git", args = ["--repository", "/home/you/projects/active"] }
]

[memory]
provider = "mem0"
config = { vector_store = { provider = "lancedb", path = "/home/you/.mem0/lancedb" } }

Network egress verification

The audit step. Every offline-claim deployment should have a repeatable verification that produces a network capture during normal operation. The capture should be empty (no packets to non-loopback destinations).

# Run packet capture during a smoke-test query
sudo tcpdump -i any -w session-capture.pcap \
  'not host 127.0.0.1 and not host ::1 and not net 192.168.0.0/16' &
TCPDUMP_PID=$!

# Run a representative agent task
openhands run --task "Find the bug in tests/auth.test.ts and fix it"

# Stop capture
sudo kill $TCPDUMP_PID

# Inspect the capture — should be empty or near-empty
tcpdump -r session-capture.pcap -nn | head -20

# Expected: no packets, or only DHCP / ARP / multicast-DNS
# (which are local-network only). If you see DNS lookups for
# external domains, telemetry to api-something.com, or HuggingFace
# Hub URLs — the stack is leaking. Find the leak before declaring
# the deployment audit-clean.

Failure modes you'll hit

  1. Docker DNS lookups for image-pull on first run. Even with images pre-loaded, Docker may attempt registry lookups for tag verification. Set --pull never on every docker run and configure Docker daemon offline mode.
  2. HuggingFace cache phone-home. Some transformers code attempts to verify model hashes via the HF Hub even when the cache is local. Set HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1 environment variables in every container that touches the model.
  3. npm registry lookups during MCP server start. Some MCP servers verify their own version on startup. Pinnpm config set registry http://localhost at the OS level to make this fail loudly rather than reach out.
  4. Time synchronization drift. Air-gapped machines often run NTP against an internal time source; without it, the system clock drifts. SSL handshakes inside the local network can break. Run a local NTP server.
  5. Container update prompts. Some containers display update-available banners that imply a network check happened. Investigate every banner; some are local comparisons against bundled metadata, some are real network calls.
  6. Mem0 consolidation pass leaks. Mem0 with cloud LLM provider for consolidation = data leaving the network. Always configure Mem0 with the local vLLM endpoint for both inference AND consolidation.
  7. VS Code extension auto-update. If you're using Cline or Continue, IDE extensions will attempt auto-update. Set extensions.autoUpdate: false in VS Code settings.

Variations and alternatives

Apple Silicon variation. Replace vLLM + RTX 4090 with MLX-LM + M3 Max. Same offline discipline applies — pre-stage MLX models, verify no network egress during operation.

RAG-instead variation. If the workflow is document-search rather than coding, see /stacks/offline-rag-workstation. Same air-gap discipline; different application surface.

OpenClaw alternative. Possible but more difficult to audit because OpenClaw moves faster than OpenHands. The dependency-pinning surface is bigger. OpenHands is the more conservative pick for offline deployments where audit time is finite.

Aider variation. If your workflow is surgical-edit-only (not autonomous), Aider is simpler to audit (smaller dependency surface). Trade autonomous-task quality for fewer audit surfaces.

Who should avoid this stack

  • Anyone whose privacy needs are softer than stated. If “cloud-friendly with reasonable controls” is acceptable, the cloud-API path or the general /stacks/local-coding-agent is faster to set up and operationally cheaper. This stack costs you ergonomics for a guarantee you may not actually need.
  • Anyone unable to allocate audit cycles. Offline stacks need periodic re-verification — dependencies update, OS patches need staging, configurations drift. Without monthly audit cycles, the stack stays nominally offline but slowly accretes unverified dependencies.
  • Anyone who needs reasoning models. Their dependency surfaces (sometimes including external CDNs for tokenizer files) make offline staging harder. Achievable but adds significant audit overhead.
  • Anyone needing IDE integration. VS Code + Cline / Continue + Copilot all phone home in various ways. Disabling all of it produces a degraded developer experience. Acceptable for some teams; deal-breaker for others.

Going deeper