Operator path
Operator-reviewed

Privacy-first: keep every prompt local

For: Operators with privacy or compliance reasons not to send prompts off-machine. By the end: A reproducible local stack with no outbound model calls, audit-grade logging, and a verification recipe.

By Fredoline Eruo8 milestonesLast reviewed 2026-05-07

You have a real reason that prompts cannot leave your machine — legal, contractual, ethical, or simply principled. The hard part is not "run a local model"; the hard part is verifying that nothing in your pipeline is quietly phoning home, and keeping that property as you add tools. This path moves you from "I run a local model" to "I have audit-grade evidence that no prompt or completion left this box."

Establish a baseline with packet capture

Before you change anything, capture what your machine currently does. Run a packet capture (tcpdump, Wireshark, Little Snitch on macOS) for 30 minutes while you use your existing AI tooling. Save it. This is your before-state. You will compare against it after every later milestone.

You may be surprised. Editor extensions, telemetry agents, update checkers, completion services — many of these are on by default and reach out without notice. The first principle of privacy work is "measure first."

When this is done you should have
A 30-minute packet capture of normal use saved to disk, and a list of every domain your stack contacts.

Pick a runtime with no telemetry by default

llama.cpp is the cleanest baseline — no analytics, no update probe, no cloud anything. Build it from source, run it on a port, done. Ollama is also clean by default in recent versions but verify the analytics flag is off and confirm in your packet capture.

Avoid: any runtime that ships a vendor account, a "share anonymous usage data" toggle that defaults on, or a model registry that requires authentication. Those are not privacy-first; those are convenience-first.

When this is done you should have
A running llama.cpp or Ollama instance configured with no analytics, no telemetry, no cloud sync.

Pin a model from a trusted mirror, then go offline

Download the GGUF file once, record its SHA256, and then unplug the network and confirm the runtime can still load and serve it. This is your airgap test. If the runtime phones a license server or a metadata endpoint at startup, you'll find out here.

For maximum hygiene: keep the original download checksum written down somewhere outside the machine. If the file ever changes, you want to know.

When this is done you should have
The model files saved to a known-good local path, with checksums recorded. Subsequent runs work with the network unplugged.

Pick a UI / client that can be airgapped

Open WebUI is fully self-hosted and works completely offline once installed. LM Studio is a desktop app with offline mode but verify its update check + model browser don't sneak data out. Either one — but only one — and verify in your packet capture.

Avoid web-based UIs that need to call out to a CDN, font host, or analytics provider. Those are not airgappable even if the model itself is local.

When this is done you should have
An Open WebUI or local-only client running, configured with no external integrations, working with the network unplugged.

Implement log discipline you can audit

Privacy-first does not mean no-logs. It means logs you control. Configure your runtime to write prompts and completions to a file you own, rotated and retained according to your policy. If you can't audit your own stack, you can't claim it's private.

Decide retention up front: 30 days, 90 days, or session-only? Document the answer; enforce it with logrotate or similar. Don't ad-hoc purge — that's how you accidentally destroy evidence you needed.

When this is done you should have
A log file with every prompt and completion, rotated daily, with retention you control. A documented procedure for the rare cases where you must purge.

Block outbound network at the firewall

Configuration in apps is recommendation; firewall rules are enforcement. Bind your model server to 127.0.0.1, not 0.0.0.0. Add an explicit egress block at the OS firewall for the user account running the inference. If you decide you want a model registry, allowlist exactly that hostname and nothing else.

Use the second packet capture to confirm. The before-state had outbound traffic; the after-state has none.

When this is done you should have
Host firewall configured to allow only the loopback interface for AI services, with explicit allowlist for any external service you decide to keep.

Add agents and tools without breaking the property

The hardest milestone, because every framework you add is a new attack surface for the privacy property. Aider can be configured to use a local OpenAI-compatible endpoint only. Cline can. Continue.dev can. Vector databases can. Verify each one with a packet capture after you add it.

One tool that quietly defaults to a hosted embedding API ruins the entire property. Treat additions as one-at-a-time with capture-before-and-after.

When this is done you should have
A coding agent or RAG pipeline running entirely on local services. No prompts, embeddings, or completions cross the firewall.

Document your stack so it's reproducible

Privacy that works once is luck. Privacy that works after a clean install is a property. Write down the packages, the versions, the firewall rules, the bind addresses, and the paths. Test the document by following it on a different machine.

This last milestone is what separates an opinion ("I think this is private") from a property ("I can prove this is private"). The reproduction discipline matches the way we document our own benchmarks — see the reproduction guide.

When this is done you should have
A written record of every component, version, and config flag, sufficient for you (or an auditor) to rebuild the same private stack on a fresh machine.

Next recommended step

Threat model, audit trail, and the defensive practices that turn a private stack into a defensible one.