Morph Logo
Back to the main blog

How Cursor Composer and Apply Work

An analysis of Cursor's breakthrough in achieving 1000 tokens per second for code edits through specialized models and inference methods.

Morph AI Team

Posted by Morph AI Team

2 minute read


Analyzing Cursor Composer and Apply

Cursor recently published a fascinating technical deep-dive into their code editing technology. Their research reveals how they are handling large-scale code edits, achieving speeds of 1000 tokens per second.

Llama 3 Speculative Speed Performance Chart showing code edit performance improvements

Performance comparison showing Cursor's speculative edits achieving significant speed improvements

The Problem: Large Code Edit Challenges

According to Cursor's research, frontier models like GPT-4o and Claude struggle with large code edits in three key areas:

  1. 1.
    Latency: Traditional token-by-token generation is too slow for real-time code editing
  2. 2.
    Accuracy: Models often make mistakes on complex edits, especially with large files
  3. 3.
    Consistency: Multiple model calls can lead to infinite loops or inconsistent results
"Even small, isolated edits are plagued with bugs [...] SWE-Agent attempts a simple edit seven times before giving up due to a consistent syntactic error."

The hard part of applying is making the inference FAST and work at scale reliably.

Cursor's Context-Aware Architecture

Cursor's approach to code editing relies on a sophisticated context retrieval system with four key components:

1Embedding and Retrieval

  • • Queries embedded using specialized embedding model
  • • Fetches top-k relevant syntax chunks from codebase
  • • Provides focused context without overwhelming the model

2Reranking Process

  • • Retrieved chunks undergo reranking for relevance
  • • Ensures most pertinent context appears first
  • • Filters out less useful code snippets

3Prompt Framework

  • • Specialized prompting framework for context prioritization
  • • Structures information for maximum model understanding
  • • Ensures critical context appears in optimal positions

4Apply Phase

  • • Uses specialized fast-apply model (Llama-3-70b-ft)
  • • Executes planned changes quickly and accurately
  • • Leverages context gathered in previous steps

Technical Implementation Details

Full File Rewriting vs. Diffs

Cursor chose full file rewriting over diffs for three strategic reasons:

1

Token Context

More output tokens give the model more forward passes to determine the correct solution

2

Training Distribution

Models have seen more complete files than diffs during training

3

Line Number Challenges

Models struggle with accurate line number counting in diffs

Model Architecture

According to their technical blog post:

  • Base Model: Llama-3-70b with fine-tuning
  • Performance: ~13x speedup over vanilla inference
  • Comparison: ~9x speedup over previous GPT-4 speculative edits deployment

Speculative Edits Innovation

One of Cursor's most interesting innovations is "speculative edits", which they describe as:

"With code edits, we have a strong prior on the draft tokens at any point in time, so we can speculate on future tokens using a deterministic algorithm rather than a draft model."

This approach yields remarkable improvements:

4-5x
Speedup over traditional methods
100%
Equivalent accuracy to full-file rewrites
Faster
Than GPT-4 with speculative decoding

Performance Metrics Comparison

Cursor's evaluation methodology is particularly rigorous, using the formula:

speed = Num_Rewritten_Chars / Latency_for_Rewrite_in_seconds

Cursor's Results

~1000
tokens/s processing speed
3500
char/s throughput
400
lines consistent performance

Morph's Superior Results

1600
tokens/s processing speed
60% FASTER
1500
lines consistent performance
275% MORE CAPACITY

Model Training Insights

Cursor's blog reveals interesting training decisions that contributed to their success:

Data Preparation

  • • Downsampled files under 100 LOC for balanced training
  • • Balanced training examples per filename
  • • Filtered out no-op transformations
  • • Curated high-quality code transformation examples

Model Selection

  • • Tested Deepseek Coder Instruct vs Llama 3
  • • Found Llama-3-70b-ft performed best overall
  • • Outperformed GPT-4 Turbo in evaluations
  • • Optimized for code-specific tasks

Future Challenges and Improvements

Cursor identifies several areas for continued improvement:

Long Context Handling

Working on handling files up to 2500 lines while maintaining performance

Model Distillation

Exploring distillation to llama-3-8b for improved efficiency

Accuracy Improvements

Investigating on-policy RL for better performance

Implications for Morph

Cursor's research validates several key principles we've been exploring at Morph:

  • The importance of specialized models for code editing
  • The benefits of full-file context over diff-based approaches
  • The potential of speculative decoding in code transformation
  • The critical role of context retrieval and reranking

Their work demonstrates the viability of high-speed code transformation while highlighting the challenges and trade-offs involved in building such systems.

Comparison with Morph v0

We've been working on a similar approach to Cursor Composer and Apply, but with a focus on speed and accuracy. Our approach uses a specialized model to plan the changes, and then a different specialized model to apply the changes.

Morph's Achievements

1600 tokens/s processing speed
60% faster than Cursor
1500 lines consistent performance
275% more capacity

Technical Observations

Based on our research and development, several technical insights emerge:

RoPE Scaling Challenges

Linear scaling of RoPE ids does not scale well. This task needs to be trained on a large dataset of code edits, with long input sequence and output sequence lengths.

Memory and Compute Requirements

For large files this is incredibly memory intensive and requires significant compute resources. Optimization is crucial for practical deployment.

Process Reward Modeling

Recent process reward modeling shows promise, but it would slow applies down. The reward model would need to output diffs and prove more useful than the original code context.

Conclusion and Key Takeaways

Key Insights

  • 📊Rigorous Analysis: Cursor's deep-dive is highly technical, detailing performance metrics (~1000 tokens/s, 3500 char/s) and comparing different model architectures.
  • 🏗️Clear Structure: By dividing code editing into planning and apply phases, they simplify a complex problem into manageable components.
  • Innovative Approach: Their use of speculative edits as a deterministic mechanism to forecast future tokens is a notable innovation yielding significant speedups.
  • 🔮Transparency and Future Focus: Discussing challenges like long-context training and model distillation offers insight into ongoing research directions.

Overall, Cursor's technical blog sets a high standard for detailed and insightful analysis in the AI-driven code editing space. However, Morph's approach demonstrates that even better performance is achievable with focused optimization and innovative architectural choices.

Experience the Fastest Code Apply

Try Morph's lightning-fast code transformation with 1600+ tokens/second and see the difference for yourself.

Get Started with Morph