Grep Isn't Enough: Why Agents Need Semantic Search too

How combining semantic search with grep improved agent accuracy by 31% on large codebases.

Tejas Bhakta
Tejas Bhakta
November 6, 20255 min read
Grep Isn't Enough: Why Agents Need Semantic Search too

Coding agents fail when they can't find the right code. Grep works for exact matches. But when your agent needs to understand "where do we handle authentication?", grep returns nothing.

When to Use Which

Use grep when the agent knows exactly what to look for.
The model has high confidence. It wants getUserById, stripe.charges.create, or files matching *Controller.ts. Grep finds it instantly.

Use semantic search when the agent is exploring.
The model doesn't know what code it needs. It asks "where do we handle rate limiting?" or "how does JWT validation work?" Semantic search understands intent and finds relevant code without exact keywords.

Our evals show combining both is state of the art across all frontier models. Grep for precision. Semantic search for discovery. Together, they increase success rates by 31% on average and 56% on large codebases.


Cursor recently published research showing semantic search produces 12.5% higher accuracy on average and increases code retention by 2.6% on large codebases. Their findings match ours: grep alone isn't enough.

Morph SDK gives you turnkey semantic search with one import. All the complexity — Merkle trees, AST-aware chunking, custom embeddings, GPU reranking — handled for you.

How It Works

We trained a custom embedding model and built a two-stage retrieval system:

Stage 1: Vector search (~50ms)
HNSW index retrieves 50 candidates from the full codebase.

Stage 2: GPU rerank (~150ms)
morph-rerank-v4 scores candidates for precision, returns top 10.

Total latency: ~230ms across 1,000+ file codebases.

typescript

Why Implementation Matters

People who claim "semantic search is dead" are usually running lazy implementations.

Lazy approach:

  • OpenAI text-embedding
  • Chunk every 4k tokens

Proper approach:

  • Merkle trees to monitor file changes (only re-embed what changed)
  • AST-aware chunking that respects function/class boundaries
  • Custom code embeddings model trained on code-specific patterns
  • Fast database lookup with HNSW indexing
  • Global caching

The difference isn't semantic search vs grep. It's good semantic search vs bad semantic search. Lazy implementations add latency without improving accuracy. Proper implementations make agents faster and more accurate.

Results

We ran agents with three configurations on a benchmark of 500 coding tasks across 50 repositories (200–5,000 files):

ConfigurationSuccess RateAvg. Time to Completion
Grep only64.2%38.4s
Semantic search only71.8%35.7s
Semantic + Grep84.1%21.2s

Semantic search alone beat grep by 7.6 percentage points. But combining both increased success rate by 19.9 points over grep alone — a 31% improvement.

Why Both Matter

Semantic search narrows down to the right files. Grep pinpoints exact locations within those files.

Example workflow:

  1. Agent asks: "Where do we validate JWT tokens?" (semantic search)
  2. Returns auth/middleware.ts, utils/jwt.ts
  3. Agent searches: function.*verifyToken (grep)
  4. Finds exact implementation

This two-stage approach produced the best results across all frontier models in our evals.

Large Codebases See Bigger Gains

On repos with 1,000+ files, the improvement was more dramatic:

ConfigurationSuccess Rate (1,000+ files)
Grep only52.3%
Semantic search only66.1%
Semantic + Grep81.7%

That's a 56% increase in success rate. The bigger the codebase, the more semantic search helps. Grep scales poorly with codebase size because it relies on knowing what to look for. Semantic search lets agents explore conceptually.

How Morph SDK Works

1. Push triggers automatic indexing:

typescript

2. Search with your preferred SDK:

typescript

3. Grep for follow-up precision:

Once semantic search finds the right file, use grep to find exact symbols, imports, or call sites.

Training on Real Agent Behavior

We trained our embedding and reranking models on real agent sessions — using agent traces to learn what "relevant" means in practice. This mirrors Cursor's approach: train on actual agent behavior, not generic code similarity. Standard embedding models optimize for similarity. Ours optimizes for task completion.

Lesson

Grep works when models know what to find. Semantic search works when they need to explore. Together, they improve agent success rates by 31% on average and 56% on large codebases.

Morph SDK handles all the complexity with one import.

npm install @morphllm/morphsdkDocs