Tools · Offline-ready

AI tools that work offline

Runtimes and frontends that run a model without an internet connection — table-form: what they are, what platform, and what each can and can't do offline. Honest about the limitations: no live web, no fresh news, no remote APIs.

By Fredoline Eruo · Last reviewed 2026-05-08 · ~1,100 words

Answer first

Most local-AI runtimes work fully offline once the model and the binary are on disk. The short list of tools that pass a real airplane-mode test: Ollama, llama.cpp, LM Studio, Jan, GPT4All, KoboldCpp, text-generation-webui. For the frontend layer: Open WebUI and AnythingLLM both run fully offline if you point them at a local backend and skip the optional cloud features.

Several “privacy-first” products advertise offline operation but actually phone home for telemetry, license checks, or auto-updates that fail loud when the network is gone. Operator-grade honesty: you have to test, and the test is in the last section of this page.

What “works offline” actually means

There are three honest definitions and they are not interchangeable.

  • Inference offline. The model itself runs without an internet connection. This is the easy bar — every local-AI runtime clears it once the weights are on disk.
  • Application offline. The runtime starts, the UI loads, the chat works, no network calls happen. This is the bar that matters for airplane / travel / air-gapped use. Most pass it; a few don't (telemetry, license checks).
  • Capability offline. Everything you actually do works without internet. Web search, image-API generation, plugin marketplaces, news fetching — these all need the network. If your workflow depends on any of them, “the runtime works offline” is true but unhelpful.

Most reasonable definitions of “offline AI” mean the second bar with awareness of the third.

Runtimes that work offline

These are the engines that load and run model weights. All clear the offline-application bar once the binary and the model file are on disk.

  • Ollama — macOS, Windows, Linux. Pulls models from a registry over the network, then runs them entirely offline. The ollama serve daemon makes no outbound calls during inference. Auto-update is optional and disabled with a config flag.
  • llama.cpp — macOS, Windows, Linux. The reference CPU/GPU runtime. Single binary, no network calls, ideal for air-gapped use. The CLI is bare; pair with a frontend for daily use.
  • LM Studio — macOS, Windows, Linux. Marketplace UI for model browse needs the network; once a model is downloaded the chat and the local-API server run fully offline. Includes a built-in OpenAI-compatible endpoint other apps can hit on localhost.
  • Jan — macOS, Windows, Linux. Privacy-positioned desktop app; chat and local model running are offline-capable. Optional plugins may add network use; the defaults do not.
  • GPT4All — macOS, Windows, Linux. Curated catalog of CPU-friendly models; chat is offline once the model is downloaded.
  • KoboldCpp — macOS, Windows, Linux. llama.cpp wrapper with a writing-focused UI. Runs offline; the writing presets and personas are all bundled.
  • text-generation-webui — Linux, Windows, macOS. The “everything backend” for power users. Offline once installed; loader catalog covers GGUF, GPTQ, AWQ, ExLlamaV2.

Frontends that work offline

Frontends are the chat UI and orchestration layer. Most ride on top of a runtime listening on localhost.

  • Open WebUI — Browser-based, self-hosted via Docker or pip. Connects to Ollama or any OpenAI-compatible local API. Fully offline once the container image is on disk and you skip optional cloud-RAG add-ons.
  • AnythingLLM — Desktop app and self-hostable web app. RAG over your local documents, points at any local backend. The desktop build runs fully offline; the web build does too if you don't enable cloud connectors.
  • LM Studio's built-in chat — Already covered under runtimes; counted twice because the chat UI is itself a competent frontend many operators never need to replace.
  • KoboldCpp's built-in chat — Same situation; the writing-focused UI is offline-native.

A pattern worth noting: the most reliable offline setup is one runtime + one frontend that both run on the same machine and only talk over localhost. Adding a remote-cluster, a cloud auth provider, or a marketplace plugin is what reintroduces network dependency.

What you give up when you go fully offline

Operator-grade honesty: the tradeoffs are real. Five capability gaps you should know before you commit.

  • No web search. A local model only knows what was in its training data when it was released. Recent events, current prices, breaking news — all invisible. You can add a self-hosted search index over a local document corpus (Open WebUI + a RAG store does this), but you cannot query the live web from a fully offline setup.
  • No remote APIs. Cloud-coding agents that orchestrate “call this API, then summarize the response” do not work offline because the API does not. If the workflow requires reaching a remote service, it's not an offline workflow.
  • No fresh news / live data. Same root cause as no web search. The model's knowledge cutoff is what it is; offline operation cannot extend it.
  • No automatic model updates. A model that gets better next month does not reach your laptop unless you reconnect to download it. This is usually fine — you control the cadence — but it's a difference from cloud assistants that improve passively.
  • No remote tool use. If an agent framework expects to call out to a hosted code interpreter, a hosted browsing tool, or a hosted image generator, those calls fail offline. Local equivalents exist (sandboxed local code interpreter, headless local browser, local Stable Diffusion) but you assemble them yourself.

How to test that yours really is offline

Five-minute test that catches every common “phones home anyway” failure mode. Do it once after install and you will know.

  1. Open the runtime once with internet on to download the model and the binary. Confirm chat works.
  2. Disable the network: turn off Wi-Fi and unplug the Ethernet cable. On a laptop, airplane mode is fastest.
  3. Restart the application (don't just keep using the running one — many tools cache the network reachability check).
  4. Open a chat and send a long-ish prompt. If it generates normally, inference is offline.
  5. Watch for popups, errors, or stuck loading states. “Cannot reach license server,” “Failed to fetch model list,” or a hanging splash screen are signs the app expects the network. If everything works silently, you have a real offline setup.

For a finer-grained check, watch outbound traffic with Little Snitch (macOS), GlassWire (Windows), or tcpdump (Linux) while you use the app for ten minutes. If nothing leaves the laptop, you're truly offline. The current observed-behavior status of each runtime is in the live runtime health dashboard.

Next recommended step

Filter by platform, license, and offline-readiness.

Once you commit to a fully offline workflow, the hardware you choose determines which models you can actually run. A thin-and-light laptop from three years ago will choke on anything above 3B parameters, while a machine with unified memory handles 7B and 14B models without breaking a sweat. The difference is not subtle — it is the line between an AI tool that feels responsive and one that sits frozen on a loading spinner.

The hardware decisions that define your offline stack: best Mac for local AI.