Composer 2 Technical Report

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a specialized model training paradigm tailored to real-world software development scenarios to address the limitations of current agents in planning and coding for complex, long-horizon software engineering tasks. The approach employs a two-stage training strategy: first, continual pretraining enhances domain knowledge and foundational coding capabilities; second, large-scale reinforcement learning optimizes end-to-end multi-step reasoning, precise execution, and task coherence. Training is conducted within the Cursor framework—the same environment used at deployment—and evaluated on CursorBench, a new benchmark constructed from extensive real-world codebases. Experimental results show that the model achieves 61.3% accuracy on CursorBench, substantially outperforming prior methods, and attains state-of-the-art performance with scores of 61.7 on Terminal-Bench and 73.7 on SWE-bench Multilingual.

Technology Category

Application Category

📝 Abstract
Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learning to improve end-to-end coding performance through stronger reasoning, accurate multi-step execution, and coherence on long-horizon realistic coding problems. We develop infrastructure to support training in the same Cursor harness that is used by the deployed model, with equivalent tools and structure, and use environments that match real problems closely. To measure the ability of the model on increasingly difficult tasks, we introduce a benchmark derived from real software engineering problems in large codebases including our own. Composer 2 is a frontier-level coding model and demonstrates a process for training strong domain-specialized models. On our CursorBench evaluations the model achieves a major improvement in accuracy compared to previous Composer models (61.3). On public benchmarks the model scores 61.7 on Terminal-Bench and 73.7 on SWE-bench Multilingual in our harness, comparable to state-of-the-art systems.
Problem

Research questions and friction points this paper is trying to address.

agentic software engineering
long-horizon coding problems
domain-specialized model
realistic software engineering tasks
large codebases
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic software engineering
two-phase training
reinforcement learning for coding
realistic coding benchmarks
domain-specialized LLM
🔎 Similar Papers
No similar papers found.
Aaron Chan
Aaron Chan
Sahara AI
Machine LearningLarge Language ModelsAI AgentsDecentralized AI
A
Ahmed Shalaby
Cursor Research
Alexander Wettig
Alexander Wettig
Princeton University
Natural Language Processing
A
Aman Sanger
Cursor Research
A
Andrew Zhai
Cursor Research
Anurag Ajay
Anurag Ajay
Google DeepMind
Machine LearningArtificial Intelligence
Ashvin Nair
Ashvin Nair
OpenAI
Artificial IntelligenceRoboticsMachine Learning
Charlie Snell
Charlie Snell
UC Berkeley
Artificial IntelligenceMachine LearningNatural Language ProcessingReinforcement Learning
C
Chen Lu
Cursor Research
C
Chen Shen
Cursor Research
E
Emily Jia
Cursor Research
Federico Cassano
Federico Cassano
Northeastern University
Artificial IntelligenceProgramming LanguagesSupply Chain Security
H
Hanpeng Liu
Cursor Research
H
Haoyu Chen
Cursor Research
H
Henry Wildermuth
Cursor Research
J
Jacob Jackson
Cursor Research
J
Janet Li
Cursor Research
J
Jediah Katz
Cursor Research
J
Jiajun Yao
Cursor Research
Joey Hejna
Joey Hejna
Stanford University
Reinforcement LearningMachine Learning
J
Josh Warner
Cursor Research
J
Julius Vering
Cursor Research
K
Kevin Frans
Cursor Research