Morph Gets Faster: 4500+ Tokens Per Second
We were already the fastest way to apply AI-generated code edits—processing at 2500 tokens per second while traditional approaches struggled at 200-400 tok/sec.
Now Morph v3 hits 4500 tokens per second. That's 2.25x faster than our previous model, 3x faster than search-and-replace approaches, and 3x cheaper than forcing frontier models into diff formats. We're launching with three model options:
Model | Speed | Best For |
---|---|---|
morph-v3-fast | 4500+ tok/sec | Most coding agents and files |
morph-v3-large | 2500+ tok/sec | Complex edits requiring maximum accuracy |
auto | Variable | Automatically routes to the best model based on complexity — requests billed by model used |
Why Speed Matters for Coding Agents
Speed and reliability are fundamental to every coding agent worth building. Without both, you get unpredictable AI with sluggish tools.
Every successful coding agent obsesses over latency because cognitive flow state has a half-life measured in seconds. When Claude suggests a perfect refactor but it takes 15 seconds to apply, you've lost context.
Traditional diff approaches fail 9-25% of the time on real world complex edits. Morph combines 4500 tok/sec speed with ~2% failure rates. That reliability multiplier separates production agents from demos.
Technical Improvements
Advanced Speculative Decoding
Our v1 used unchanged code portions as drafts. V2 speculates on semantic patterns within changes:
- Function signatures often remain unchanged when editing bodies
- Adding functionality follows predictable import patterns
- Code style creates strong priors for formatting
Infrastructure Optimizations
- Fused attention operations that minimize GPU memory transfers
- Dynamic batching that adapts to real-time request patterns
- Custom CUDA kernels optimized for code transformation
Intelligent Model Routing
Our auto model analyzes each request in real-time to determine optimal routing:
- Simple edits (variable renames, imports, minor fixes) → morph-v3-fast
- Complex refactors (architectural changes, multi-function edits) → morph-v3-large
- Contextual analysis considers file size, change complexity, and historical patterns
Real Performance Impact
Model Selection Guidelines:
- Use morph-v3-fast for most coding tasks, agent workflows, and standard file edits
- Use morph-v3-large for complex refactors, architectural changes, or when maximum precision matters
- Use auto to let our system intelligently route based on edit complexity
Enterprise benefits:
- 2x cost reduction and speed improvement vs forcing Claude into diff formats
- Batch processing feels instant
- Real-time streaming—see results before completion
- Intelligent model routing optimizes both speed and cost
New Possibilities
At 4500 tok/sec, new workflows become possible:
- Speculative applies: Process changes before user clicks "apply"
- Multi-file refactors: Coordinate dozens of files in seconds
- Interactive editing: Real-time feedback as models generate suggestions
Speed Thresholds
- Below 1000 tok/sec: Breaks flow state
- 1000-2000 tok/sec: Good
- 4500+ tok/sec: Infrastructure becomes invisible (i.e. you can't tell it's there)
Ready for 4500 tok/sec edits? Get your Morph API key
Building a coding agent? Contact us about dedicated instances.
Speed isn't everything. It's the only thing that makes everything else possible.