Operator path
Operator-reviewed

Budget laptop: what's actually usable

For: Owners of laptops with integrated GPU (or 4-8GB discrete) and 16GB system RAM. By the end: A 1-3B model running at conversational speed on integrated graphics, used for the things small models actually do well.

By Fredoline Eruo6 milestonesLast reviewed 2026-05-07

On a laptop with integrated graphics (or a 4-6GB discrete GPU) and 16GB system RAM, the hard truth is: you cannot run the large modern open-weight models at usable speed. The good news: small models have gotten remarkably capable, and the things you can actually do — drafting, classification, proofreading, code review of small files — are real and useful. This path keeps your expectations honest while you build a working setup.

Inventory your laptop honestly

Integrated graphics (Intel Iris Xe, AMD Radeon 780M, Apple base M-series) work, but they share system RAM and their bandwidth is much lower than discrete GPUs. A 4GB discrete GPU (RTX 3050, MX 550) can use that VRAM exclusively but is still small. The right ceiling for most of these is 3B-class models in Q4, with 7B as a slow fallback for non-interactive batch work.

Run the will-it-run tool with your specs. The honest answer it gives is the right starting point.

When this is done you should have
Recorded: GPU type (integrated / discrete + VRAM), system RAM, CPU class, OS. The truthful answer about your ceiling.

Install Ollama or LM Studio and load a 1-3B model

Llama 3.2 1B and 3B, Qwen 2.5 1.5B and 3B, and Phi-3.5 Mini are the right targets. These models are real — they handle simple Q&A, classification, summarization of short documents, and basic chat at usable quality. They will not write a novel or refactor your codebase; that's not their job.

Aim for >10 tok/s on a 3B model. Anything less feels broken. If you can't hit that, drop to a 1B model — it's a much smaller leap in capability than the speedup you'll feel.

When this is done you should have
A 1B-3B model loaded and answering. Tokens per second written down. Memory pressure measured.

Identify the use cases small models actually own

What small local models do well: structured-output classification (sentiment, intent, category), short summarization, proofreading, simple Q&A over notes, and offline draft assistance when the network is bad. What they don't do well: long-form generation, complex reasoning, anything requiring coherent multi-turn planning.

Write your list down. Then test each task against your small model. Some will work better than expected; some will fail and you'll move them to "things I do in the cloud."

When this is done you should have
A short, written list of two to four real tasks where a small model on your laptop is genuinely the right tool.

Pick the right runtime for your laptop's GPU

Integrated graphics on Intel: IPEX-LLM is the first-class path. Integrated graphics on AMD: llama.cpp with the Vulkan backend. Apple base M-series: Ollama (or MLX-LM if you want to push). Windows with mixed vendors: DirectML is the cross-vendor option. Don't fight your hardware; pick the runtime that matches it.

When this is done you should have
A runtime configured for your specific hardware: llama.cpp on integrated graphics, IPEX-LLM for Intel Arc, DirectML on Windows multi-vendor, or Ollama Metal on Mac base chips.

Manage thermals and battery realistically

Inference on a thin-and-light laptop will heat the chassis and pull battery fast. Plug in for any sustained session, unload the model when you're not using it, and don't keep it loaded in the background "just in case." A laptop fan running flat-out for an hour is a laptop fan that fails earlier than it should.

When this is done you should have
A tested working pattern: laptop plugged in for sustained inference, model unloaded when not in use, fan profile set to handle it.

Decide what's actually worth upgrading

You now have real data: which tasks your laptop handles, which it doesn't, and what hardware would change the answer. If your small-model use cases are met, stop — you have a useful local AI setup that costs you nothing. If they aren't, the upgrade decision is a clear-eyed one.

When this is done you should have
A clear answer: stay on the laptop forever, get a discrete GPU box, or invest in a Mac with more unified memory.

Next recommended step

Now that you've validated which workloads your laptop can serve, the chooser walks the upgrade decision honestly.