Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing reinforcement learning with verifiable rewards (RLVR) approaches for code generation are limited by a scarcity of challenging and diverse verifiable tasks, hindering the full potential of large language models. This work proposes the Atomic Decomposition and Recombination (ADR) framework, which introduces, for the first time, a compositional synthesis paradigm. By decomposing coding tasks into atomic components and recombining them in a controlled manner, ADR systematically generates verifiable tasks that are simultaneously novel, difficult, and diverse. This approach transcends the limitations of conventional heuristic data augmentation, yielding significant performance gains across downstream applications such as algorithmic programming, tool usage, and data science. The generated tasks consistently surpass existing baselines in both test quality and challenge level.

📝 Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities of Large Language Models (LLMs). However, the scalability of RLVR is severely constrained by the scarcity of sufficiently challenging verifiable code tasks that target near the model's edge of competence. Prior studies often rely on heuristic seed expansions for data synthesis, which severely limits both novelty and difficulty. Consequently, the training value of such data fails to scale proportionally with the size of its synthesis. To this end, we propose Atomic Decomposition and Recombination (ADR), a novel framework that generates verifiable code tasks via decomposition into atomic elements and controlled recombination, thereby enabling the generation of genuinely novel and challenging verifiable code tasks. Experiments and analysis demonstrate that ADR achieves superior originality, difficulty, diversity, and test quality over existing baselines, and consistently delivers greater improvements in code ability across RLVR in diverse downstream domains, including algorithmic programming, tool usage, and data science. Our work sheds light on a new paradigm for novel code task synthesis and scalable RLVR training.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning with Verifiable Rewards

code task synthesis

scalability

verifiable code tasks

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Atomic Decomposition and Recombination

Reinforcement Learning with Verifiable Rewards

Code Task Synthesis