How Cursor Composer and Apply Work
An analysis of Cursor's breakthrough in achieving 1000 tokens per second for code edits through specialized models and inference methods.
Analyzing Cursor Composer and Apply
Cursor recently published a fascinating technical deep-dive into their code editing technology. Their research reveals how they are handling large-scale code edits, achieving speeds of 1000 tokens per second.
The Problem: Large Code Edit Challenges
According to Cursor's research, frontier models like GPT-4o and Claude struggle with large code edits in three key areas:
- Latency: Traditional token-by-token generation is too slow
- Accuracy: Models often make mistakes on complex edits
- Consistency: Multiple model calls can lead to infinite loops or inconsistent results The hard part of applying is the making the inference FAST, and work at scale reliably.
As they demonstrate in their blog post:
"Even small, isolated edits are plagued with bugs [...] SWE-Agent attempts a simple edit seven times before giving up due to a consistent syntactic error."
Cursor's Two-Stage Solution
Cursor breaks down code editing into two distinct phases:
-
Planning Phase:
- A prompting framework called Priompt for prioritizing context
- Uses a frontier model for reasoning about changes
- Handles the high-level understanding of what needs to change
- Takes place in their chat interface
-
Apply Phase:
- Uses their specialized fast-apply model - Llama-3-70b-ft
- Focuses on executing the planned changes quickly and accurately
The Technical Implementation
Cursor's blog reveals several key technical decisions:
Full File Rewriting vs. Diffs
They chose full file rewriting over diffs for three reasons:
- Token Context: More output tokens give the model more forward passes to determine the correct solution
- Training Distribution: Models have seen more complete files than diffs during training
- Line Number Challenges: Models struggle with accurate line number counting in diffs
Model Architecture
According to their blog post:
- Base Model: Llama-3-70b
- Performance: "~13x speedup over vanilla inference"
- Comparison: "~9x speedup over previous GPT-4 speculative edits deployment"
Speculative Edits
One of their most interesting innovations is "speculative edits", which they describe as:
"With code edits, we have a strong prior on the draft tokens at any point in time, so we can speculate on future tokens using a deterministic algorithm rather than a draft model."
This approach yields:
- 4-5x speedup over traditional methods
- Equivalent accuracy to full-file rewrites
- Significantly faster than GPT-4 with speculative decoding
Performance Metrics
Cursor's evaluation methodology is particularly rigorous:
speed = Num_Rewritten_Chars / Latency_for_Rewrite_in_seconds
Their metrics show:
- ~1000 tokens/s processing speed
- 3500 char/s throughput
- Consistent performance across file sizes up to 400 lines
Model Training Insights
Their blog reveals interesting training decisions:
-
Data Preparation:
- Downsampled files under 100 LOC
- Balanced training examples per filename
- Filtered out no-op transformations
-
Model Selection:
- Tested both Deepseek Coder Instruct and Llama 3
- Found Llama-3-70b-ft performed best
- Outperformed GPT-4 Turbo in evaluations
Future Challenges
Cursor identifies several areas for improvement:
- Long Context: Working on handling files up to 2500 lines
- Model Size: Exploring distillation to llama-3-8b
- Accuracy: Investigating on-policy RL for better performance
Implications for Morph
Cursor's research validates several key principles we've been exploring:
- The importance of specialized models for code editing
- The benefits of full-file context over diff-based approaches
- The potential of speculative decoding in code transformation
Their work demonstrates the viability of high-speed code transformation while highlighting the challenges and trade-offs involved in building such systems.
Conclusion and Key Takeaways
- Rigorous Analysis: Cursor's deep-dive is highly technical, detailing performance metrics (e.g., ~1000 tokens/s, 3500 char/s) and comparing different model architectures.
- Clear Structure: By dividing code editing into planning and apply phases, they simplify a complex problem into manageable components.
- Innovative Approach: Their use of speculative edits as a deterministic mechanism to forecast future tokens is a notable innovation that yields significant speedups.
- Transparency and Future Focus: Discussing challenges like long-context training and model distillation offers readers insight into ongoing research directions. Overall, Cursor's technical blog sets a high standard for detailed and insightful analysis in the AI-driven code editing space.
Comparison with morph-v0
We've been working on a similar approach to Cursor Composer and Apply, but with a focus on speed and accuracy. Our approach is to use a specialized model to plan the changes, and then use a different specialized model to apply the changes.
We've achieved:
- 1000 tokens/s processing speed
- Consistent performance across file sizes up to 1500 lines
Observations
- Linear scaling of RoPE ids does not scale well. This task needs to be trained on a large dataset of code edits, with long input squence and output sequence lengths.
- For large files this is incredibly memory intensive, and requires a lot of compute.
- Recent process reward modeing shows promise, but it would slow applies down. The reward model would need to output diffs and the output content would need to prove to be more useful context for it than the original code.