Morph Gets Faster: 4500+ Tokens Per Second

We were already the fastest way to apply AI code edits. Now we're 2.25x faster—and 3x cheaper than search-and-replace approaches.

Tejas Bhakta
Tejas Bhakta
June 15, 20253 min read

Morph Gets Faster: 4500+ Tokens Per Second

Speed comparison showing Morph v3-fast at 4500+ tokens per second

We were already the fastest way to apply AI-generated code edits—processing at 2500 tokens per second while traditional approaches struggled at 200-400 tok/sec.

Now Morph v3 hits 4500 tokens per second. That's 2.25x faster than our previous model, 3x faster than search-and-replace approaches, and 3x cheaper than forcing frontier models into diff formats. We're launching with three model options:

ModelSpeedBest For
morph-v3-fast4500+ tok/secMost coding agents and files
morph-v3-large2500+ tok/secComplex edits requiring maximum accuracy
autoVariableAutomatically routes to the best model based on complexity — requests billed by model used

Why Speed Matters for Coding Agents

Speed and reliability are fundamental to every coding agent worth building. Without both, you get unpredictable AI with sluggish tools.

Every successful coding agent obsesses over latency because cognitive flow state has a half-life measured in seconds. When Claude suggests a perfect refactor but it takes 15 seconds to apply, you've lost context.

Traditional diff approaches fail 9-25% of the time on real world complex edits. Morph combines 4500 tok/sec speed with ~2% failure rates. That reliability multiplier separates production agents from demos.

Technical Improvements

Advanced Speculative Decoding

Our v1 used unchanged code portions as drafts. V2 speculates on semantic patterns within changes:

  • Function signatures often remain unchanged when editing bodies
  • Adding functionality follows predictable import patterns
  • Code style creates strong priors for formatting

Infrastructure Optimizations

  • Fused attention operations that minimize GPU memory transfers
  • Dynamic batching that adapts to real-time request patterns
  • Custom CUDA kernels optimized for code transformation

Intelligent Model Routing

Our auto model analyzes each request in real-time to determine optimal routing:

  • Simple edits (variable renames, imports, minor fixes) → morph-v3-fast
  • Complex refactors (architectural changes, multi-function edits) → morph-v3-large
  • Contextual analysis considers file size, change complexity, and historical patterns

Real Performance Impact

Model Selection Guidelines:

  • Use morph-v3-fast for most coding tasks, agent workflows, and standard file edits
  • Use morph-v3-large for complex refactors, architectural changes, or when maximum precision matters
  • Use auto to let our system intelligently route based on edit complexity

Enterprise benefits:

  • 2x cost reduction and speed improvement vs forcing Claude into diff formats
  • Batch processing feels instant
  • Real-time streaming—see results before completion
  • Intelligent model routing optimizes both speed and cost

New Possibilities

At 4500 tok/sec, new workflows become possible:

  • Speculative applies: Process changes before user clicks "apply"
  • Multi-file refactors: Coordinate dozens of files in seconds
  • Interactive editing: Real-time feedback as models generate suggestions

Speed Thresholds

  • Below 1000 tok/sec: Breaks flow state
  • 1000-2000 tok/sec: Good
  • 4500+ tok/sec: Infrastructure becomes invisible (i.e. you can't tell it's there)

Ready for 4500 tok/sec edits? Get your Morph API key

Building a coding agent? Contact us about dedicated instances.


Speed isn't everything. It's the only thing that makes everything else possible.