Privacy-first: keep every prompt local
For: Operators with privacy or compliance reasons not to send prompts off-machine. By the end: A reproducible local stack with no outbound model calls, audit-grade logging, and a verification recipe.
You have a real reason that prompts cannot leave your machine — legal, contractual, ethical, or simply principled. The hard part is not "run a local model"; the hard part is verifying that nothing in your pipeline is quietly phoning home, and keeping that property as you add tools. This path moves you from "I run a local model" to "I have audit-grade evidence that no prompt or completion left this box."
Establish a baseline with packet capture
Before you change anything, capture what your machine currently does. Run a packet capture (tcpdump, Wireshark, Little Snitch on macOS) for 30 minutes while you use your existing AI tooling. Save it. This is your before-state. You will compare against it after every later milestone.
You may be surprised. Editor extensions, telemetry agents, update checkers, completion services — many of these are on by default and reach out without notice. The first principle of privacy work is "measure first."
Pick a runtime with no telemetry by default
llama.cpp is the cleanest baseline — no analytics, no update probe, no cloud anything. Build it from source, run it on a port, done. Ollama is also clean by default in recent versions but verify the analytics flag is off and confirm in your packet capture.
Avoid: any runtime that ships a vendor account, a "share anonymous usage data" toggle that defaults on, or a model registry that requires authentication. Those are not privacy-first; those are convenience-first.
Pin a model from a trusted mirror, then go offline
Download the GGUF file once, record its SHA256, and then unplug the network and confirm the runtime can still load and serve it. This is your airgap test. If the runtime phones a license server or a metadata endpoint at startup, you'll find out here.
For maximum hygiene: keep the original download checksum written down somewhere outside the machine. If the file ever changes, you want to know.
Pick a UI / client that can be airgapped
Open WebUI is fully self-hosted and works completely offline once installed. LM Studio is a desktop app with offline mode but verify its update check + model browser don't sneak data out. Either one — but only one — and verify in your packet capture.
Avoid web-based UIs that need to call out to a CDN, font host, or analytics provider. Those are not airgappable even if the model itself is local.
Implement log discipline you can audit
Privacy-first does not mean no-logs. It means logs you control. Configure your runtime to write prompts and completions to a file you own, rotated and retained according to your policy. If you can't audit your own stack, you can't claim it's private.
Decide retention up front: 30 days, 90 days, or session-only? Document the answer; enforce it with logrotate or similar. Don't ad-hoc purge — that's how you accidentally destroy evidence you needed.
Block outbound network at the firewall
Configuration in apps is recommendation; firewall rules are enforcement. Bind your model server to 127.0.0.1, not 0.0.0.0. Add an explicit egress block at the OS firewall for the user account running the inference. If you decide you want a model registry, allowlist exactly that hostname and nothing else.
Use the second packet capture to confirm. The before-state had outbound traffic; the after-state has none.
Add agents and tools without breaking the property
The hardest milestone, because every framework you add is a new attack surface for the privacy property. Aider can be configured to use a local OpenAI-compatible endpoint only. Cline can. Continue.dev can. Vector databases can. Verify each one with a packet capture after you add it.
One tool that quietly defaults to a hosted embedding API ruins the entire property. Treat additions as one-at-a-time with capture-before-and-after.
Document your stack so it's reproducible
Privacy that works once is luck. Privacy that works after a clean install is a property. Write down the packages, the versions, the firewall rules, the bind addresses, and the paths. Test the document by following it on a different machine.
This last milestone is what separates an opinion ("I think this is private") from a property ("I can prove this is private"). The reproduction discipline matches the way we document our own benchmarks — see the reproduction guide.
Next recommended step
Threat model, audit trail, and the defensive practices that turn a private stack into a defensible one.