🤖 AI Summary
This study addresses the common practice in automated program repair of discarding code comments during preprocessing, which may inadvertently impair the performance of large language models (LLMs). Through systematic empirical evaluation, the work demonstrates that preserving comments—particularly those describing implementation details—can improve repair accuracy by up to threefold. The authors further propose enhancing training data by using LLMs to automatically generate missing comments. Ablation studies and interpretability analyses confirm that incorporating comments during both training and inference significantly boosts repair effectiveness, while models trained with comments maintain robust performance on comment-free test instances, showing no degradation in generalization.
📝 Abstract
Large Language Models (LLMs) are increasingly relevant in Software Engineering research and practice, with Automated Bug Fixing (ABF) being one of their key applications. ABF involves transforming a buggy method into its fixed equivalent. A common preprocessing step in ABF involves removing comments from code prior to training. However, we hypothesize that comments may play a critical role in fixing certain types of bugs by providing valuable design and implementation insights. In this study, we investigate how the presence or absence of comments, both during training and at inference time, impacts the bug-fixing capabilities of LLMs. We conduct an empirical evaluation comparing two model families, each evaluated under all combinations of training and inference conditions (with and without comments), and thereby revisiting the common practice of removing comments during training. To address the limited availability of comments in state-of-the-art datasets, we use an LLM to automatically generate comments for methods lacking them. Our findings show that comments improve ABF accuracy by up to threefold when present in both phases, while training with comments does not degrade performance when instances lack them. Additionally, an interpretability analysis identifies that comments detailing method implementation are particularly effective in aiding LLMs to fix bugs accurately.