Investigating Execution-Aware Language Models for Code Optimization

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work investigates the impact of incorporating runtime execution information—specifically line coverage, branch coverage, variable states, and execution frequency—into large language models for code (LLMs) on automated code optimization performance. Building upon CodeT5+, we systematically design three execution-aware pretraining strategies and, for the first time, quantitatively evaluate the individual and joint contributions of these four execution dimensions to efficiency-oriented optimization within a unified framework. Experimental results reveal only marginal performance gains from execution-aware modeling; several configurations even underperform the baseline significantly—challenging the widely held implicit assumption that execution signals are inherently beneficial. This study critically questions the prevailing consensus on the efficacy of execution-augmented modeling paradigms and provides key empirical evidence and reflective insights for jointly advancing interpretability and practical utility in code LLMs.

Technology Category

Application Category

📝 Abstract

Code optimization is the process of enhancing code efficiency, while preserving its intended functionality. This process often requires a deep understanding of the code execution behavior at run-time to identify and address inefficiencies effectively. Recent studies have shown that language models can play a significant role in automating code optimization. However, these models may have insufficient knowledge of how code execute at run-time. To address this limitation, researchers have developed strategies that integrate code execution information into language models. These strategies have shown promise, enhancing the effectiveness of language models in various software engineering tasks. However, despite the close relationship between code execution behavior and efficiency, the specific impact of these strategies on code optimization remains largely unexplored. This study investigates how incorporating code execution information into language models affects their ability to optimize code. Specifically, we apply three different training strategies to incorporate four code execution aspects -- line executions, line coverage, branch coverage, and variable states -- into CodeT5+, a well-known language model for code. Our results indicate that execution-aware models provide limited benefits compared to the standard CodeT5+ model in optimizing code.

Problem

Research questions and friction points this paper is trying to address.

Investigates impact of execution-aware models on code optimization.

Explores integration of runtime execution data into language models.

Evaluates effectiveness of CodeT5+ with execution-aware strategies.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates code execution data into language models

Uses CodeT5+ for execution-aware code optimization

Tests four execution aspects for model training

🔎 Similar Papers

Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency