Beyond Quantity: Trajectory Diversity Scaling for Code Agents

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of large code models as tool-interacting agents, which stems from low-quality synthetic data, diminishing returns from data-scale-driven expansion, and underutilization of trajectory data. To overcome these challenges, the authors propose TDScaling—a framework that optimizes the performance–cost trade-off under a fixed training budget by enhancing trajectory diversity rather than merely increasing data volume. TDScaling introduces several key innovations: a business-cluster mechanism, blueprint-driven multi-agent collaborative generation, an adaptive evolution strategy based on multidimensional entropy (spanning domains and reasoning patterns), and a sandboxed execution environment to prevent mode collapse while preserving foundational coding capabilities. Extensive experiments demonstrate that TDScaling significantly improves both tool-use generalization and intrinsic coding proficiency across multiple benchmarks, including BFCL, tau²-Bench, RebenchT, CodeCI, and BIRD, thereby validating the efficacy of diversity-driven scaling.

Technology Category

Application Category

📝 Abstract
As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling. Moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, tau^2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. We plan to release the full codebase and the synthesized dataset (including 30,000+ tool clusters) upon publication.
Problem

Research questions and friction points this paper is trying to address.

trajectory diversity
code agents
data scaling
tool-use generalization
synthetic data quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory Diversity Scaling
Business Cluster
Blueprint-driven Multi-agent
Adaptive Evolution
Sandboxed Code Tool
🔎 Similar Papers
No similar papers found.
Guhong Chen
Guhong Chen
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
LLM、NLP
C
Chenghao Sun
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Cheng Fu
Cheng Fu
Institute of Software, Chinese Academy of Sciences
Entity ResolutionKnowledge GraphNatural Language Processing
Qiyao Wang
Qiyao Wang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelsAgentic AIPatent ProcessingAI for IP
Z
Zhihong Huang
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; Tongyi Laboratory
C
Chaopeng Wei
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Guangxu Chen
Guangxu Chen
Xiamen University (Ph.D.); Stanford University (PostDoc); School of Environment and Energy (SCUT)
Biomass ValorizationCO2RRCatalytic dehydrogenationPlastic recycleAI4Science
Feiteng Fang
Feiteng Fang
University of Science and Technology of China
LLMNLP
A
A. Argha
UNSW Sydney
Bing Zhao
Bing Zhao
SRI International
Natural Language ProcessingMachine LearningOptimizations
X
Xander Xu
Alibaba Group
Qi Han
Qi Han
StepFun
Vision Foundation modelLarge Language Model
Hamid Alinejad-Rokny
Hamid Alinejad-Rokny
ARC DECRA & UNSW Scientia Fellow, Head of BioMedical Machine Learning Lab
BioMedical Machine LearningMachine Learning for HealthMedical Artificial IntelligenceLLMs
Qiang Qu
Qiang Qu
Professor, Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology
BlockchainData IntelligenceData-intensive SystemsData Mining
B
Binhua Li
Alibaba Group
S
Shiwen Ni
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; SUAT
Min Yang
Min Yang
Bytedance
Vision Language ModelComputer VisionVideo Understanding
H
Hu Wei
Alibaba Group
Y
Yongbin Li
Tongyi Laboratory