Morph Breaks 10,500 Tokens Per Second

Morph v3-fast now applies code edits at 10,500+ tok/sec.

Tejas Bhakta
Tejas Bhakta
September 15, 20253 min read

Pushing the Limits of Nvidia GPUs

Performance graph showing Morph v3-fast breaking through 10,500 tokens per second

1 month ago, Morph v3 hit 4,500 tokens/sec—already the fastest way to apply AI-generated code edits.

Today, Morph v3-fast shatters the 10,000 barrier: 10,500+ tokens/sec per request.

No gimmicks. No batching tricks. Try it yourself with our API.
👉 See full benchmarks

That’s 2.3x faster than our previous best, 5x faster than search-and-replace, and 175x faster than frontier models.


The Fastest Model Ever on Nvidia Hardware

ModelSpeedBest For
morph-v3-fast10,500+ tok/secComplex edits with speed + accuracy balance
morph-v3-large2,500 tok/secMaximum precision for high complexity

Why 10,500 Tok/Sec Matters

At this speed, AI code application becomes invisible infrastructure.

  • Before: Claude suggests a refactor → 8s delay → context lost

  • Now: Refactor applied before you finish reading

  • Before: Multi-file changes = batching + prayer

  • Now: 50+ files in under 2s

  • Before: Agents pause, users wait, flow breaks

  • Now: Continuous flow—nothing to notice

A team at a Fortune 100 fintech just applied a 15k-token multi-file refactor in under 400ms. The same job took ~20 seconds with traditional tooling.


Technical Breakthroughs

Speculative Architecture

  • Semantic speculation → predicts patterns inside logical blocks
  • Structural speculation → anticipates formatting + indentation
  • Context speculation → pre-generates likely completions

GPU Optimizations

  • Fused transformer ops eliminate intermediate memory writes
  • Dynamic attention patterns tuned for code structure
  • Custom kernels optimized for Hopper/Blackwell

The Invisible Threshold

Human perception studies show:

  • sub 500ms: Feels instant
  • 500–1000ms: Noticeable but fine
  • 1s+: Breaks flow

At 10,500 tok/sec, even 5k-token files process in sub 300ms.
That puts almost every coding task in the “invisible” range.


Workflows Unlocked

  • Speculative Editing: Apply changes as models generate suggestions
  • Real-Time Collaboration: Multiple developers accept AI diffs instantly
  • Agent Swarms: Dozens of agents coordinate edits in parallel
  • Interactive Refactoring: Restructure entire codebases with real-time feedback

Benchmarks

Single file edits (1k–3k tokens):

  • morph-v3-fast: 500ms
  • Traditional: 2,500–7,500ms

Multi-file refactors (10k+ tokens):

  • morph-v3-fast: ~1000ms
  • Traditional: 25,000ms+

Architectural changes (25k+ tokens):

  • morph-v3-large: ~2,500ms
  • Traditional: 60,000ms+

👉 Full benchmark suite here


What’s Next

  • 15,000+ tok/sec with next-gen kernels
  • Sub-100ms latency for most common edits
  • Batch operations across hundreds of files in real time

If your coding agents aren’t doing edits at 10,500 tok/sec, they’re already behind.
Get your Morph API key or contact us for dedicated throughput instances.