Warp Grep Benchmarks

Agentic code search performance on real-world repositories.

SWE-bench Pro

Leaderboard view. Green bars use Warp Grep, outlined bars are baseline.

SWE-bench Pro Improvement

With Warp Grep

Baseline

15%

Average Cost Reduction

19%

Average Time Reduction

Turns Saved on Average

Detailed Performance Improvement

We ran the official SWE-bench Pro benchmark using MiniMax with both baseline and Warp Grep runs.

Warp Grep can make an open-source model outperform the frontier ones.

Sweep: MiniMax 2.5

Metric	Baseline	Warp Grep	Delta
Avg events/instance	157	135	-14%
Avg prompt tokens	2,926,502	2,461,973	-16%
Avg completion tokens	17,190	15,222	-11%
Avg reasoning tokens	7,347	6,835	-7%
Avg cost/instance	$0.18	$0.15	-17%
Total cost (18 inst)	$3.26	$2.77	-15%

Warp Grep makes small open-source models beat frontier ones.

Agent Capabilities Improvement

SWE-bench evaluation with Claude 4.5 Opus — WarpGrep as code search tool vs. without. Better search directly improves agent effectiveness.

Input Tokens

39% fewer

14K9K

Agent Turns

26% fewer

35.026.0

Tasks Solved

10% more

74.4%81.9%

Input Tokens39% fewer

Without WarpGrep

14K

With WarpGrep

Agent Turns26% fewer

Without WarpGrep

35.0

With WarpGrep

26.0

Tasks Solved10% more

Without WarpGrep

74.4%

With WarpGrep

81.9%

Build better coding agents

WarpGrep is available as an API and SDK component. Join 500+ teams using Morph.

Get Started Book a Call Run Benchmark

Morph Fast Apply

Morph WarpGrep

Morph Glance

Morph MCP

Morph Monitor

Warp Grep Benchmarks

SWE-bench Pro

Detailed Performance Improvement

Agent Capabilities Improvement

Build better coding agents