How Cursor Composer and Apply Work - Deep Technical Analysis

Analyzing Cursor Agent and Apply

Cursor recently published a fascinating technical deep-dive into their code editing technology. Their research reveals how they are handling large-scale code edits, achieving speeds of 1000 tokens per second.

Performance comparison showing Cursor's speculative edits achieving significant speed improvements

The Problem: Large Code Edit Challenges

According to Cursor's research, frontier models like GPT-4o and Claude struggle with large code edits in three key areas:

1Latency

Traditional token-by-token generation is too slow for real-time code editing

2Accuracy

Models often make mistakes on complex edits, especially with large files

3Consistency

Multiple model calls can lead to infinite loops or inconsistent results

"Even small, isolated edits are plagued with bugs [...] SWE-Agent attempts a simple edit seven times before giving up due to a consistent syntactic error."

The hard part of applying is making the inference FAST and work at scale reliably.

Cursor's Context-Aware Architecture

Cursor's approach to code editing relies on a sophisticated context retrieval system with four key components:

1Embedding and Retrieval

Queries embedded using specialized model, fetches top-k relevant syntax chunks from codebase, provides focused context without overwhelming the model.

2Reranking Process

Retrieved chunks undergo reranking for relevance, ensures most pertinent context appears first, filters out less useful code snippets.

3Prompt Framework

Specialized prompting framework for context prioritization, structures information for maximum model understanding, ensures critical context appears in optimal positions.

4Apply Phase

Uses specialized fast-apply model (Llama-3-70b-ft), executes planned changes quickly and accurately, leverages context gathered in previous steps.

Technical Implementation Details

Full File Rewriting vs. Diffs

Cursor chose full file rewriting over diffs for three strategic reasons:

Token Context

More output tokens give the model more forward passes to determine the correct solution

Training Distribution

Models have seen more complete files than diffs during training

Line Number Challenges

Models struggle with accurate line number counting in diffs

Model Architecture

According to their technical blog post:

Base Model: Llama-3-70b with fine-tuning
Performance: ~13x speedup over vanilla inference
Comparison: ~9x speedup over previous GPT-4 speculative edits deployment

Speculative Edits Innovation

One of Cursor's most interesting innovations is "speculative edits", which they describe as:

"With code edits, we have a strong prior on the draft tokens at any point in time, so we can speculate on future tokens using a deterministic algorithm rather than a draft model."

This approach yields remarkable improvements:

4-5x

Speedup over traditional methods

100%

Equivalent accuracy to full-file rewrites

Faster

Than GPT-4 with speculative decoding

Performance Metrics Comparison

Cursor's evaluation methodology is particularly rigorous, using the formula:

speed = Num_Rewritten_Chars / Latency_for_Rewrite_in_seconds

Cursor's Results

~1000

tokens/s processing speed

3500

char/s throughput

400

lines consistent performance

Morph's Superior Results

10,500+

tokens/s processing speed (350% FASTER)

1500

lines consistent performance (275% MORE)

Model Training Insights

Cursor's blog reveals interesting training decisions that contributed to their success:

Data Preparation

• Downsampled files under 100 LOC for balanced training
• Balanced training examples per filename
• Filtered out no-op transformations
• Curated high-quality code transformation examples

Model Selection

• Tested Deepseek Coder Instruct vs Llama 3
• Found Llama-3-70b-ft performed best overall
• Outperformed GPT-4 Turbo in evaluations
• Optimized for code-specific tasks

Future Challenges and Improvements

Cursor identifies several areas for continued improvement:

Long Context Handling

Working on handling files up to 2500 lines while maintaining performance

Model Distillation

Exploring distillation to llama-3-8b for improved efficiency

Accuracy Improvements

Investigating on-policy RL for better performance

Implications for Morph

Cursor's research validates several key principles we've been exploring at Morph:

The importance of specialized models for code editing
The benefits of full-file context over diff-based approaches
The potential of speculative decoding in code transformation
The critical role of context retrieval and reranking

Their work demonstrates the viability of high-speed code transformation while highlighting the challenges and trade-offs involved in building such systems.

Comparison with Morph v3-large

We've been working on a similar approach to Cursor Agent and Apply, but with a focus and accuracy. Our approach uses a specialized model to plan the changes, and then a different specialized model to apply the changes.

Morph's Achievements

✓

10,500+ tokens/s processing speed

Matches Cursor's performance

✓

2500 lines consistent performance

Matches Cursor's performance

Technical Observations

Based on our research and development, several technical insights emerge:

RoPE Scaling Challenges

Linear scaling of RoPE ids does not scale well. This task needs to be trained on a large dataset of code edits, with long input sequence and output sequence lengths.

Memory and Compute Requirements

For large files this is incredibly memory intensive and requires significant compute resources. Optimization is crucial for practical deployment.

Process Reward Modeling

Recent process reward modeling shows promise, but it would slow applies down. The reward model would need to output diffs and prove more useful than the original code context.

Conclusion and Key Takeaways

Key Insights

📊Rigorous Analysis: Cursor's deep-dive is highly technical, detailing performance metrics (~1000 tokens/s, 3500 char/s) and comparing different model architectures.
🏗️Clear Structure: By dividing code editing into planning and apply phases, they simplify a complex problem into manageable components.
⚡Innovative Approach: Their use of speculative edits as a deterministic mechanism to forecast future tokens is a notable innovation yielding significant speedups.
🔮Transparency and Future Focus: Discussing challenges like long-context training and model distillation offers insight into ongoing research directions.

Overall, Cursor's technical blog sets a high standard for detailed and insightful analysis in the AI-driven code editing space. However, Morph's approach demonstrates that even better performance is achievable with focused optimization and innovative architectural choices.

Experience the Fastest Code Apply

Try Morph's lightning-fast code transformation with 10,500+ tokens/second and see the difference for yourself.

Get Started with Morph

Try Demo

How Cursor Agent and Apply Work

Analyzing Cursor Agent and Apply

The Problem: Large Code Edit Challenges

1Latency

2Accuracy

3Consistency

Cursor's Context-Aware Architecture

1Embedding and Retrieval

2Reranking Process

3Prompt Framework

4Apply Phase

Technical Implementation Details

Full File Rewriting vs. Diffs

Token Context

Training Distribution

Line Number Challenges

Model Architecture

Speculative Edits Innovation

Performance Metrics Comparison

Cursor's Results

Morph's Superior Results

Model Training Insights

Data Preparation

Model Selection

Future Challenges and Improvements

Long Context Handling

Model Distillation

Accuracy Improvements

Implications for Morph

Comparison with Morph v3-large

Morph's Achievements

Technical Observations

RoPE Scaling Challenges

Memory and Compute Requirements

Process Reward Modeling

Conclusion and Key Takeaways

Key Insights

Experience the Fastest Code Apply