Multi-Agent Systems
Coordinated multi-agent workflows — manager+worker, debate, swarm patterns. CrewAI, AutoGen, Swarm.
Setup walkthrough
- Install Ollama →
ollama pull llama3.1:8b(~5 GB — works for agent roles that don't need coding). pip install crewai(CrewAI — role-based multi-agent orchestration).- Define a multi-agent team:
from crewai import Agent, Task, Crew, Process
researcher = Agent(role="Market Researcher", goal="Find data on topic", llm="ollama/llama3.1:8b")
analyst = Agent(role="Data Analyst", goal="Analyze findings", llm="ollama/llama3.1:8b")
writer = Agent(role="Report Writer", goal="Write final report", llm="ollama/llama3.1:8b")
research_task = Task(description="Research the top 5 local AI hardware trends in 2026", agent=researcher)
analysis_task = Task(description="Analyze the research findings and identify key patterns", agent=analyst)
writing_task = Task(description="Write a 500-word executive summary with 3 key recommendations", agent=writer)
crew = Crew(agents=[researcher, analyst, writer], tasks=[research_task, analysis_task, writing_task], process=Process.sequential)
result = crew.kickoff()
- Each agent runs its task, passes output to the next agent. A 3-agent pipeline completes in 1-3 minutes on 12 GB GPU.
- For debate/consensus patterns: CrewAI supports hierarchical (manager delegates) and sequential (pipeline) processes.
- Use cases: content creation pipelines, research synthesis, code review teams (reviewer + security auditor + style checker).
The cheap setup
Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs CrewAI with 3-5 Llama 3.1 8B agents running sequentially. A typical 3-step pipeline completes in 1-3 minutes. For small teams automating content creation, research synthesis, or document review: $400 handles 20-50 pipeline runs/day. Pair with Ryzen 5 5600 + 16 GB DDR4 + 512 GB NVMe. Total: ~$360-405. Multi-agent systems are GPU-efficient — agents run sequentially (one GPU, one at a time) or in parallel on CPU for text-only roles.
The serious setup
Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs CrewAI with 5-10 agents using Llama 3.3 70B Q4 for reasoning-heavy roles (analyst, strategist) and 8B models for simpler roles (researcher, formatter). For enterprise multi-agent systems (legal document review with 3 specialized agents, financial analysis with analyst + auditor + writer): the 70B model dramatically improves reasoning quality in complex agent roles. Total: ~$1,800-2,200. For parallel multi-agent execution: 2× RTX 3090 runs two 70B agents simultaneously.
Common beginner mistake
The mistake: Creating a 10-agent system where every agent is the same model with a different system prompt, expecting dramatically better output than a single agent. Why it fails: Same model + same knowledge cutoff + different prompt = the same agent in cosplay. The "researcher" and "analyst" both have identical knowledge and reasoning patterns — the "analyst" can't find insights the "researcher" couldn't. The output is just the same answer rewritten 3 times. The fix: Use different models for different roles. Researcher = 8B (fast, broad search). Analyst = 32B (deeper reasoning). Writer = 8B (fluent output). Different model architectures bring different strengths. Or: give agents access to different tools/data — the researcher has web search, the analyst has a database, the writer has style guides. Differentiated inputs + differentiated models = true multi-agent value. Same model + same data = one agent with extra steps.
Recommended setup for multi-agent systems
Browse all tools for runtimes that fit this workload.
Reality check
Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.
Common mistakes
- Buying for spec-sheet VRAM without modeling KV cache + activation overhead
- Underestimating quantization quality loss below Q4
- Skipping flash-attention support (real perf gap on long context)
- Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)
What breaks first
The errors most operators hit when running multi-agent systems locally. Each links to a diagnose+fix walkthrough.
Before you buy
Verify your specific hardware can handle multi-agent systems before committing money.