Warp Grep Benchmarks

Agentic code search performance on real-world repositories.

SWE-bench Pro

Leaderboard view. Green bars use Warp Grep, outlined bars are baseline.

SWE-bench Pro Improvement
With Warp Grep
Baseline
15%
Average Cost Reduction
19%
Average Time Reduction
28
Turns Saved on Average

Detailed Performance Improvement

We ran the official SWE-bench Pro benchmark using MiniMax with both baseline and Warp Grep runs.

Warp Grep can make an open-source model outperform the frontier ones.

Sweep: MiniMax 2.5
MetricBaselineWarp GrepDelta
Avg events/instance157135-14%
Avg prompt tokens2,926,5022,461,973-16%
Avg completion tokens17,19015,222-11%
Avg reasoning tokens7,3476,835-7%
Avg cost/instance$0.18$0.15-17%
Total cost (18 inst)$3.26$2.77-15%

Warp Grep makes small open-source models beat frontier ones.

Agent Capabilities Improvement

SWE-bench evaluation with Claude 4.5 Opus — WarpGrep as code search tool vs. without. Better search directly improves agent effectiveness.

Input Tokens
39% fewer
14K9K
Agent Turns
26% fewer
35.026.0
Tasks Solved
10% more
74.4%81.9%
Input Tokens39% fewer
Without WarpGrep
14K
With WarpGrep
9K
Agent Turns26% fewer
Without WarpGrep
35.0
With WarpGrep
26.0
Tasks Solved10% more
Without WarpGrep
74.4%
With WarpGrep
81.9%

Build better coding agents

WarpGrep is available as an API and SDK component. Join 500+ teams using Morph.