AI agents that navigate and interact with web browsers. Browser-use, Playwright-based agents, BrowserBase pattern.
pip install browser-use (Browser-Use — open-source browser agent framework).ollama pull qwen2.5-vl:7b (~5 GB — vision-language model for seeing web pages).from browser_use import Agent
import asyncio
async def main():
agent = Agent(
task="Go to wikipedia.org, search for 'artificial intelligence', and extract the first paragraph of the article.",
llm=ChatOllama(model="qwen2.5-vl:7b"),
)
result = await agent.run()
print(result)
asyncio.run(main())
~/invoices/." The agent handles login, navigation, dropdowns, file downloads.Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs Browser-Use with Qwen2-VL 7B at ~20-40 seconds per agent step (see → decide → act). A 5-step task (search, click, scroll, click, extract) takes 2-4 minutes. For automation of daily web tasks (form filling, data extraction, monitoring): $400 handles 10-30 tasks/day. Pair with Ryzen 5 5600 + 32 GB DDR4 + 512 GB NVMe. Total: ~$390-440. The bottleneck is VLM inference speed (5-10 seconds per screenshot) and reasoning quality.
Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Browser-Use with Qwen2-VL 72B at ~30-60 seconds per agent step — the 72B handles complex UIs (enterprise SaaS dashboards, multi-step forms, CAPTCHA workarounds) that confuse the 7B. For production browser automation (web scraping at scale, automated testing): Qwen2-VL 7B on RTX 4090 ($2,000, see /hardware/rtx-4090) achieves 10-15 agent steps per minute. Total: ~$1,500-2,500. Browser agents are a VLM throughput problem — faster screenshot analysis = faster agent.
The mistake: Running a browser agent on your personal Chrome profile (with saved passwords, cookies, banking sessions) — giving the AI access to your entire digital life. Why it fails: The agent can click anything. It can navigate to your email → forward sensitive messages → delete evidence. It can access your bank → initiate transfers. Even if the prompt seems benign ("check my Amazon orders"), the agent might misclick into account settings, change your password, or order 100 copies of a book. The fix: Always use a dedicated browser profile. Create a separate Chrome/Chromium profile with only the logins the agent needs. Use incognito mode + manual login per session for sensitive tasks. Never give a browser agent access to your primary browser with saved passwords and active banking/email sessions. The agent is a program executing LLM decisions — it has no judgment about what's safe to click. Sandbox it.
Browse all tools for runtimes that fit this workload.
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
The errors most operators hit when running browser agents locally. Each links to a diagnose+fix walkthrough.
Verify your specific hardware can handle browser agents before committing money.
Agent workflows run multiple tool calls in sequence — sustained tok/s matters more than peak. The guides below frame the buyer decision.