π€ AI Summary
This work addresses the limitation of existing large code models, which treat entire programs as a single optimization objective and thus fail to explicitly model structural factors influencing execution efficiency. To overcome this, the authors propose Skeleton-Guided Direct Preference Optimization (SG-DPO), a novel framework that introduces the concept of code skeletons into preference optimization for the first time. By extracting and contrasting skeletons from efficient and inefficient implementations, the method constructs fine-grained skeleton alignment signals and designs a joint preference loss function over both code and skeletons. This approach transcends conventional reliance solely on correctness or holistic efficiency preferences, achieving significant performance gains on complex tasksβimproving Pass@1, Beyond@1, and Effi@1 by 3β6%, 3β7%, and 2β5%, respectively.
π Abstract
With the remarkable progress of Code Large Language Models (Code LLMs) in achieving semantic correctness, execution efficiency has become an increasingly important dimension for evaluating their practical utility. However, existing approaches typically treat full programs as a single optimization target during training, without explicitly modeling the structural factors that influence efficiency. As a result, although these models can generate semantically correct code, they fail to learn, at a fine-grained level, the underlying skeleton features that lead to efficient implementations. To address this limitation, we propose SkelDPO (Skeleton-Guided Direct Preference Optimization), a skeleton-guided preference optimization framework that systematically enhances the efficiency of code generation. SkelDPO first identifies efficient and inefficient implementations from the code dataset and, through comparative analysis, locates their efficiency-prone and inefficiency-prone points, forming alignment signals between efficiency and inefficiency skeletons. During training, a joint code and skeleton preference loss is introduced, enabling the model to learn semantic correctness while reinforcing its understanding of efficiency-critical components in code. Results show that SkelDPO consistently surpasses existing methods: compared with SOTA method that relies solely on efficient and inefficient code preference optimization, it improves Pass@1, Beyond@1, and Effi@1 by 3-6%, 3-7%, and 2-5%, with greater improvements observed on complex tasks. Overall, SkelDPO provides a new perspective on skeleton-level efficiency alignment, breaking the limitation of conventional preference optimization that relies solely on correctness or efficiency pairs. All datasets and source code are publicly available at: https://github.com/icpcSkelDPO/SkelDPO.