Codebase indexing is how AI coding tools build a searchable map of your project so they can find relevant code without reading everything from scratch. Cursor builds vector embeddings. Cline and Claude Code skip indexing entirely. Augment indexes everything including commit history and docs. The approach each tool takes determines its speed, accuracy, cost, and failure modes.
What Is Codebase Indexing
Codebase indexing is the process of building a searchable representation of your source code so an AI model can find relevant files, functions, and context quickly. Instead of reading your entire project on every query, the tool pre-processes your code into a format optimized for retrieval.
The simplest version is a text index, like what grep searches against. More sophisticated approaches use vector embeddings that capture the semantic meaning of code chunks, enabling queries like "where does authentication happen" to return results even when no file contains that exact string.
There are three main approaches in 2026:
Embedding-Based Indexing
Chunk code with AST-aware splitters, compute vector embeddings, store in a vector database. Query by semantic similarity. Used by Cursor, Windsurf, and Copilot Enterprise.
Agentic Search
No pre-built index. The model runs grep, glob, and file reads at runtime, iteratively refining its search. Used by Claude Code, Cline, and several open source agents.
Hybrid / MCP-Based
Plug semantic indexing into any agent via MCP. Augment Context Engine, CodeGrok, and Serena add structured search without replacing the core agent architecture.
How Cursor Indexes Your Code
Cursor's indexing pipeline is the most documented embedding-based approach. When you open a project, Cursor scans the folder and computes a Merkle tree of file hashes, which it syncs to Cursor's servers.
Cursor's indexing pipeline
1. Scan project → compute Merkle tree of file hashes
2. Chunk code using AST-aware splitters (tree-sitter)
3. Compute vector embeddings per chunk
4. Store in Turbopuffer (vector + full-text search)
5. Cache embeddings by content hash (skip unchanged files)
6. Re-sync every ~5 minutes to detect changesThe chunking step matters most. Naive text splitting tears apart function definitions, separating a function call from its implementation. Cursor uses tree-sitter for AST-aware splitting, traversing the syntax tree depth-first to produce chunks that respect code structure. Embeddings are cached by content hash on AWS, so unchanged code reuses previous embeddings across indexing runs.
Cursor's shared team indexing (launched 2026) means new team members can reuse existing indexes instead of waiting hours for initial indexing. For large monorepos, this cuts onboarding from hours to seconds.
Windsurf takes a similar approach
Windsurf uses its own RAG pipeline optimized for enterprise-scale repos with 100M+ lines of code. It pre-computes code snippet indexes and retrieves relevant context during generation. Remote indexing supports multiple repositories and maintains architectural awareness across distributed systems.
The Staleness Problem
Code indexes are snapshots frozen in time. Cursor re-syncs every five minutes, but a developer on a hot streak can rename functions, move files, and restructure modules in that window. Forge Code's research found that indexed agents are 22% faster on average, but produce incorrect suggestions when code has changed since the last sync. Non-indexed agents are slower but never reference phantom APIs or deleted functions.
| Factor | Indexed (Cursor, Windsurf) | Non-Indexed (Claude Code, Cline) |
|---|---|---|
| Semantic search | Yes, via embeddings | No (text matching only) |
| Staleness risk | 5-minute sync gap | None (reads live filesystem) |
| Token cost per query | Lower (pre-filtered results) | Higher (iterative search) |
| Setup overhead | Initial indexing (minutes to hours) | None |
| Security surface | Code sent to embedding service | Code stays local |
| Large repo performance | Fast after indexing | Scales with search complexity |
The Agentic Search Alternative
Claude Code, Cline, and several open source agents take the opposite approach: no index at all. The model explores the codebase at runtime using grep, glob, and file reads, with each search step informing the next.
Agentic search in practice
# The model iterates, refining each query based on results:
Turn 1: glob "**/*auth*" → 5 matching files
Turn 2: grep "validateToken" → 3 matches in src/middleware/
Turn 3: read src/middleware/auth.ts → found the handler
Turn 4: grep "auth.ts" imports → traced the call chain
# Contrast with embedding search:
Query: "where does authentication happen"
→ Returns top-5 nearest embeddings (one shot, no refinement)Boris Cherny, who created Claude Code, explained the reasoning: "Early versions of Claude Code used RAG + a local vector DB, but we found pretty quickly that agentic search generally works better. It is also simpler and doesn't have the same issues around security, privacy, staleness, and reliability."
Cline's team makes a sharper argument against chunking: "When you chunk code for embeddings, you're literally tearing apart its logic. A function call might be in one chunk, its definition in another, and the critical context scattered across other fragments." Cline instead reads files connection by connection, following imports and tracing dependencies to build understanding incrementally.
The tradeoff is real
Agentic search works well for targeted queries where the model knows roughly what to look for. It struggles with "I don't know the name" queries where the implementation uses different terminology than the spec. If you ask "where does rate limiting happen" but the code calls it throttleMiddleware, grep won't find it. Semantic search will.
Augment Context Engine MCP
Augment's Context Engine, released as an MCP server in February 2026, represents the hybrid approach. It ships semantic indexing as a pluggable component that works with any MCP-compatible agent, including Claude Code and Cursor.
The Context Engine goes beyond code structure. It indexes commit history, codebase patterns, external documentation, and what Augment calls tribal knowledge, the implicit conventions that live in a team's collective memory but nowhere in the code itself.
You can run the Auggie CLI locally as an MCP server that indexes in real-time as you edit, or connect to Augment's hosted service over HTTP. The key insight: the performance gain comes from better retrieval, not a better underlying model. The same Claude or GPT model performs dramatically better when it receives more relevant context.
The Token Cost Debate
The cost argument is the sharpest point of contention in the indexing debate. Milvus published a detailed critique of Claude Code's grep-only approach, arguing it burns tokens by dumping irrelevant matches into the context window. Their data shows vector search reduces token usage by 40% compared to raw grep.
| Approach | Token Efficiency | Accuracy Tradeoff |
|---|---|---|
| Raw grep (no filtering) | Worst: dumps all matches | High recall, low precision |
| Embedding search | 40% less than grep | Semantic matching, may miss exact code |
| AST-based retrieval | 74% less than raw reads | Structural accuracy, language-dependent |
| RL-trained search agent | 15.6% cheaper than self-search | Iterative refinement, high precision |
But the critique misses a nuance. Claude Code doesn't just run a single grep and dump results. It runs an agentic loop: grep, analyze results, refine the query, grep again. Each turn produces better-targeted results. The question is whether the cumulative cost of multiple refined searches exceeds the cost of a single embedding lookup.
For small-to-medium repos (under 50k lines), the difference is marginal. For monorepos with millions of lines, indexed search pulls ahead on cost because the search space is too large for iterative grep to navigate efficiently.
Open Source Code Indexing Tools
The MCP ecosystem has produced several open source tools that add codebase indexing to any compatible agent:
CodeGrok MCP
Combines tree-sitter AST parsing with vector embeddings. Indexes code once, then AI queries semantically and receives only the 5-10 most relevant snippets. Claims 10-100x token reduction.
Serena MCP
Uses Language Server Protocol for symbol-level navigation across 30+ languages. Provides IDE-like capabilities (go-to-definition, find-references) to any MCP-compatible agent.
Claude Context (Zilliz)
Open source MCP plugin that adds semantic code search to Claude Code using Milvus/Zilliz vector database. Brings embedding-based retrieval to agents that don't natively support it.
CodeGrok is particularly interesting because it combines two indexing strategies: AST parsing for structural understanding and vector embeddings for semantic search. When an agent asks about a function, CodeGrok returns the function definition, its callers, and related types as a coherent unit rather than fragmented text chunks.
Serena takes a different angle. Instead of embeddings, it wraps the Language Server Protocol, giving AI agents the same go-to-definition, find-all-references, and symbol-search capabilities that IDEs provide. This means precise, always-current navigation without any embedding pipeline.
RL-Trained Search: A Third Path
Both indexing and agentic search have a shared weakness: they load search results directly into the coding model's context window. Whether those results come from an embedding query or a grep command, the coding model processes them alongside everything else it's tracking. This is where context rot sets in.
WarpGrep isolates search into a separate context window. It's an RL-trained search agent that explores your codebase, filters results, and returns only the relevant file and line ranges. The parent model never sees the 15 files that were explored and rejected.
How WarpGrep differs from indexing and agentic search
# Indexed search (Cursor-style):
1. Pre-build embeddings of all code chunks
2. Query → nearest-neighbor lookup → return top-k chunks
3. Results go into coding model's context
Problem: stale index, chunking artifacts, semantic-only matching
# Agentic search (Claude Code-style):
1. Coding model runs grep/glob/read in its own context
2. Each search result stays in the context window
3. Irrelevant results accumulate as context noise
Problem: token waste, context rot in the coding model
# RL-trained search agent (WarpGrep):
1. Search agent explores in its own isolated context window
2. Agent iteratively searches, reads, filters, backtracks
3. Returns only: "src/api/webhooks.ts, lines 47-89"
4. Coding model receives 150 tokens of precise context
Advantage: no index, no staleness, no context rotThe cost savings come from an unintuitive place. Adding a second model to the pipeline makes the overall system cheaper because the expensive reasoning model stops wasting tokens on search. It sees fewer irrelevant files, generates fewer wasted tokens, and finishes tasks sooner.
Choosing the Right Approach
The indexing debate isn't a binary choice. Each approach excels in specific scenarios:
| Scenario | Best Approach | Why |
|---|---|---|
| Large stable monorepo | Embedding index (Cursor) | Semantic search across millions of lines |
| Fast-moving startup codebase | Agentic search (Claude Code) | No stale index, always reads current code |
| Enterprise multi-repo | Augment Context Engine | Cross-repo awareness, team knowledge |
| Cost-sensitive long tasks | RL search agent (WarpGrep) | Context isolation prevents token waste |
| Need semantic + exact match | Hybrid (CodeGrok, Claude Context) | Combines AST parsing with embeddings |
| Symbol-level navigation | Serena MCP | LSP integration, always current |
The tools that will matter most are the ones that let you compose these approaches. MCP makes this possible: you can run Claude Code for agentic search, add Augment's Context Engine for semantic queries, and use WarpGrep for context-isolated search, all in the same workflow. The goal isn't to pick one approach forever. It's to match the retrieval strategy to the task.
Frequently Asked Questions
What is codebase indexing?
Codebase indexing is the process of building a searchable representation of your source code so an AI model can find relevant files, functions, and context without reading everything from scratch. Tools like Cursor use vector embeddings and AST-aware chunking, while others like Claude Code skip indexing and use agentic search at runtime.
How does Cursor index your codebase?
Cursor scans your project, computes a Merkle tree of file hashes, chunks code using tree-sitter AST-aware splitters, and stores vector embeddings in Turbopuffer. Unchanged files are cached by content hash. The index re-syncs approximately every 5 minutes to pick up changes.
Why doesn't Claude Code use codebase indexing?
Boris Cherny, Claude Code's creator, explained that early versions used RAG with a local vector DB, but agentic search outperformed everything. The model runs grep, glob, and file reads at runtime. This avoids stale indexes, security concerns from duplicating code into vector stores, and the complexity of maintaining embedding pipelines.
What is the Augment Context Engine MCP?
The Augment Context Engine is an MCP server released in February 2026 that provides semantic codebase indexing to any MCP-compatible agent. It indexes code structure, commit history, external docs, and team conventions. Adding it improved agent performance by 70% across Claude Code, Cursor, and Codex.
Does codebase indexing waste tokens?
It depends on the approach. Embedding search reduces token usage by 40% compared to raw grep. AST-based retrieval can cut tokens by 74%. But agentic search advocates argue that iterative refinement produces more precise results. RL-trained search agents like WarpGrep reduce costs by 15.6% through context isolation.
What is agentic search for code?
Agentic search is a retrieval approach where the AI model explores the codebase at runtime using tools like grep, glob, and file reads rather than querying a pre-built index. Each step informs the next, allowing iterative refinement. Claude Code and Cline use this approach. The tradeoff is higher per-query token usage in exchange for always-current results.
What are the best open source codebase indexing tools?
Notable options include CodeGrok MCP (AST parsing + vector embeddings, 10x context reduction), Serena MCP (LSP integration for symbol navigation across 30+ languages), and Augment Context Engine MCP. All work as MCP servers compatible with Claude Code, Cursor, and other agents.
How does WarpGrep compare to codebase indexing?
WarpGrep is an RL-trained search agent that runs in its own context window. Unlike static indexes, it searches the live filesystem and returns only relevant file and line ranges to the parent model. On SWE-Bench Pro, it lifts every frontier model to #1 while being 15.6% cheaper and 28% faster. No index to maintain means no staleness.
Search Without Indexing Overhead
WarpGrep is an RL-trained search agent that finds code in its own context window and returns only what matters. No embeddings to maintain, no index to go stale. 15.6% cheaper, 28% faster, and #1 on SWE-Bench Pro with every frontier model.