Capability notes
Repo-chat tools let you ask natural-language questions about a codebase — "where is authentication logic," "how does the payment flow work" — receiving answers grounded in your actual code via RAG over the repository. Quality depends on three factors: **indexing** (does retrieval find all relevant code), **context assembly** (does the assembled context contain enough to answer), and **model capability** (can the LLM reason about retrieved code).
**What repo-chat does well**: (1) Locate functionality — "where is the rate limiter" returns exact file/function. (2) Explain code — "explain the caching layer" line-by-line. (3) Trace execution paths — "trace request lifecycle from router to database." (4) How-do-I questions — "add a new API endpoint" based on existing patterns. (5) Type-aware code generation using project's actual types.
**What repo-chat cannot do**: (1) Understand unwritten conventions — "never import directly from utils/." (2) Navigate highly abstracted code — deep inheritance chains, metaprogramming, dependency injection are invisible to static analysis. (3) Assess performance — "is this query efficient" can't discern query plans. (4) Understand intent — "why was this written this way" requires git history and design docs.
**Type-aware retrieval** is the key differentiator. [Continue.dev](/tools/continue) uses tree-sitter for AST parsing (function boundaries, class hierarchies, import graphs). [Cursor](/tools/cursor) uses proprietary code-optimized embeddings capturing semantic similarity better than general-purpose embeddings ([BGE-M3](/models/bge-m3) does code retrieval but isn't optimized for it). Sourcegraph Cody uses SCIP for precise cross-repository symbol resolution.
**Answer quality vs manual reading**: For straightforward questions ("where is this defined"), repo-chat is faster (seconds vs minutes). For complex questions ("is there a race condition here"), manual reading remains more reliable — the LLM misses cross-file interactions and non-local effects that human experience catches. Repo-chat is a search/explanation accelerator, not a replacement for understanding.
If you just want to try this
Lowest-friction path to a working setup.
Install [Continue.dev](/tools/continue) as a VS Code or JetBrains extension. Free, open-source, 15-minute setup. After installation, open your project. Continue indexes your codebase in background — 2-5 minutes for a 50,000-line project on a modern laptop. Once indexing completes, open Continue chat (Ctrl+L / Cmd+L) and type `@codebase` followed by your question.
Continue uses [BGE-M3](/models/bge-m3) embeddings by default for code indexing, combined with tree-sitter AST parsing. The answering model is your chosen LLM — configure via [Ollama](/tools/ollama), [LM Studio](/tools/lm-studio), or any API.
For best local setup: Install [Ollama](/tools/ollama), pull [DeepSeek Coder V3](/models/deepseek-coder-v3) (`ollama pull deepseek-coder-v3`) or [CodeGemma 7B](/models/codegemma-7b) if less VRAM. Set model to `ollama/deepseek-coder-v3` in Continue config. Local model handles code explanation and simple generation well; complex multi-file reasoning benefits from API models (Claude).
Expect accurate first few answers, then diminishing returns on architectural questions. This is expected — repo-chat is a force multiplier for navigation, not an oracle. Use it to find things faster; read code yourself to understand deeply.
For privacy-sensitive codebases: Continue.dev + [Ollama](/tools/ollama) with [DeepSeek Coder V3](/models/deepseek-coder-v3) or [Qwen 3 32B](/models/qwen-3-32b) fully local. No code leaves your machine. For strongest local answers, [Llama 3.3 70B](/models/llama-3-3-70b) on [RTX 4090](/hardware/rtx-4090) outperforms 32B code models on architectural reasoning.
For production deployment
Operator-grade recommendation.
Production codebase AI requires indexing freshness, permission scoping, and security — problems absent in single-developer use.
**On-premises indexing**: For proprietary code that cannot leave the network, deploy self-hosted pipeline. [Continue.dev](/tools/continue) supports local-only with [Ollama](/tools/ollama) backend and local [BGE-M3](/models/bge-m3) embeddings. Index lives on local disk — no cloud sync. For 10-100 developers: shared indexing server (one machine continuously indexing via git webhook, incremental re-index of changed files only), developers connect Continue instances to shared server. Architecture: shared index → IDE instances query only.
**Indexing freshness**: Active repos change every few hours. Stale indexes reference deleted functions and suggest refactored-away patterns. Implement continuous re-index: git webhook on push → incremental re-index (changed files only, 1-5 seconds per commit) → update vector store → notify IDE instances. Full re-index for 100K-file monorepo: 10-30 minutes. Incremental for single commit: 1-5 seconds.
**Security implications**: Indexing proprietary code creates a searchable database of your codebase — hardcoded secrets, vulnerability patterns, business logic. Mitigations: (1) Secrets scanning before indexing (truffleHog). (2) Access control on index server (same as source code repo). (3) Audit logging on all queries. (4) Encryption at rest for vector DB. (5) Never send proprietary code to third-party LLM APIs.
**Permission-aware retrieval**: Different team members see different code (contractors: UI only, backend engineers: backend only). Tag code chunks with file path and access group; filter results per-user. [pgvector](/tools/pgvector) with row-level security provides straightforward implementation.
**Model selection**: [DeepSeek Coder V3](/models/deepseek-coder-v3) (~33B active MoE) — best code-specialized open-weight, strongest on code explanation. [Qwen 3 32B](/models/qwen-3-32b) — best balance of code + general reasoning. [Llama 3.3 70B](/models/llama-3-3-70b) — strongest architectural reasoning, weakest specific code knowledge. Pair code model for "what does this do" with reasoning model for "why is this designed this way."
What breaks
Failure modes operators see in the wild.
- **Stale index (references deleted code).** Index from 3 days ago references `CacheManager` that was renamed to `CacheService` yesterday. Answers are syntactically correct but practically useless — mention non-existent files, suggest recent-changed patterns. Mitigation: incremental re-index on every push. For critical deployments: just-in-time per-file indexing before answering, adds 1-3s latency but guarantees freshness.
- **Cross-file dependency blindness.** Retrieval returns the file where a function is defined but misses 3 files that import and extend it. LLM sees function in isolation — misses subclass override, decorator behavior change, caller-enforced invariants. Mitigation: type-aware retrieval following import graphs and class hierarchies. [Continue.dev](/tools/continue) tree-sitter tracks basic imports; complex graphs need LSP integration.
- **Type information loss.** Embedding-based retrieval indexes text chunks. LLM receives function text but not its type signature context, parent class, or implemented interface. Model misinterprets parameter type because the type definition is in a different unretrieved chunk. Mitigation: enrich chunks with type annotations at indexing time from LSP/type checker.
- **Large repo performance degradation.** Monorepo 500K files → 2-5M code chunks → retrieval latency grows from <100ms to 2-10s. Developers stop using the tool. Mitigation: hierarchical indexing (file-level first, then function-level), ANN indexes with quantization, index partitioning by module.
- **Language-specific parser failures.** Rust macros (token transformations tree-sitter can't parse), C++ templates (instantiation creates code absent from source), Python decorators (runtime behavior invisible to static analysis). Retrieval sees incomplete structure. Mitigation: supplement tree-sitter with LSP-based indexing — LSP understands code as compiler sees it, after macro expansion and template instantiation.
Hardware guidance
Repo-chat splits into **indexing** (CPU + RAM + storage bound) and **inference** (GPU bound). These scale independently.
**Hobbyist (single dev, <50K lines)**: Any modern laptop 16+ GB RAM with [RTX 3060 12GB](/hardware/rtx-3060-12gb) or better. Indexing: 2-5 minutes. Inference: [CodeGemma 7B](/models/codegemma-7b) or [DeepSeek Coder V3](/models/deepseek-coder-v3) on 12 GB. [MacBook Pro 16 M4 Max 36GB](/hardware/macbook-pro-16-m4-max) handles indexing + inference on one machine — unified memory means no VRAM/RAM split. [Snapdragon X Elite](/hardware/snapdragon-x-elite) 32 GB runs CodeGemma 7B on CPU at 10-20 tok/s.
**SMB (small team, 50K-500K lines)**: Dedicated indexing server: 32+ GB RAM, NVMe (2+ GB/s), 8-16 cores. Indexing 500K lines: 10-30 minutes. Inference: [RTX 4090 24GB](/hardware/rtx-4090) runs [Qwen 3 32B](/models/qwen-3-32b) or [DeepSeek Coder V3](/models/deepseek-coder-v3) — 5-10 concurrent queries at <10s latency.
**Enterprise (10-100 devs, 500K-5M lines)**: Indexing server: 64-128 GB RAM, NVMe RAID (5+ GB/s), 16-32 cores. Inference: [RTX A6000](/hardware/rtx-a6000) 48 GB or [NVIDIA L40S](/hardware/nvidia-l40s) 48 GB runs [Llama 3.3 70B](/models/llama-3-3-70b) Q4 for strongest reasoning. 2× L40S handles 20-50 concurrent queries. Separate indexing and inference — indexing is bursty (CPU spikes on push), inference is continuous.
**Frontier (100+ devs, 5M+ lines)**: Multi-node indexing with [pgvector](/tools/pgvector) sharded by repo/module. Inference: [NVIDIA H100 PCIe](/hardware/nvidia-h100-pcie) for [DeepSeek V4](/models/deepseek-v4) or [Qwen 3 235B](/models/qwen-3-235b-a22b). Multi-GPU [vLLM](/tools/vllm) for 100+ concurrent queries.
**Storage**: 100K-line codebase → ~500K-1M chunks × 4 KB each (1024-dim FP32) = 2-4 GB index. 10M-line codebase → 200-400 GB. ANN indexes add 20-50% overhead.
Runtime guidance
**If individual developer wanting IDE codebase Q&A** → [Continue.dev](/tools/continue) as VS Code/JetBrains extension. Most mature open-source IDE code-chat. Configure local via [Ollama](/tools/ollama) for privacy, or API for quality. `@codebase` context provider indexes project and retrieves relevant code. Also supports `@file`, `@folder`, `@docs`, custom providers.
**If wanting highest-quality answers from code model** → [Cursor](/tools/cursor) with proprietary code-optimized embeddings + Claude/GPT backend. Industry-leading retrieval quality. VS Code fork (same UX). Tradeoff: codebase indexing sends embeddings to Cursor servers — privacy consideration for proprietary code.
**If needing on-premises code AI for team** → [Continue.dev](/tools/continue) + shared indexing server + [Ollama](/tools/ollama) or [vLLM](/tools/vllm) backend. Configure remote embedding server and LLM endpoint. All within your network — no code leaves.
**If needing cross-repository intelligence** → [Sourcegraph Cody](https://sourcegraph.com) (hosted or self-hosted). SCIP index provides precise cross-repo code navigation — knows `User` type in repo A is the same as `User` imported by repo B. For organizations with internal package ecosystems. Self-hosted: $0-19/user/month.
**If building custom repo-chat** → [LlamaIndex](https://www.llamaindex.ai/) or [LangChain](https://www.langchain.com/) with tree-sitter-based code chunking. [BGE-M3](/models/bge-m3) embeddings. Store in [pgvector](/tools/pgvector) or [Qdrant](/tools/qdrant). [vLLM](/tools/vllm) for answering LLM. 1-2 months engineering for production quality.
**Quick comparison**: Continue.dev (free, local, good) for privacy/cost. Cursor ($20/mo, best quality) for answer quality. Sourcegraph Cody (self-hosted option, best multi-repo). Custom (full control, $5K-50K+ engineering) for enterprise compliance.
**Model selection**: [DeepSeek Coder V3](/models/deepseek-coder-v3) — best code-specialized. [CodeGemma 7B](/models/codegemma-7b) — best small, fits 12 GB. [Qwen 3 32B](/models/qwen-3-32b) — best code+reasoning balance. [Llama 3.3 70B](/models/llama-3-3-70b) — best architectural reasoning.