CCISolver: End-to-End Detection and Repair of Method-Level Code-Comment Inconsistency

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

Code-comment inconsistency (CCI) severely undermines software maintainability, yet existing approaches suffer from poor data quality and limited detection-and-repair capability. This paper proposes CCISolver, an end-to-end CCI detection and repair framework. First, we construct CCIBench—a high-quality, manually curated benchmark dataset for CCI. Second, we design a large language model (LLM) architecture specifically optimized for CCI tasks, integrating semantic alignment and context-aware mechanisms. Third, we adopt a dual-metric evaluation combining F1-score (for detection) and GLEU (for repair), supplemented by rigorous human evaluation. Experiments demonstrate that CCISolver achieves 89.54% F1-score in detection, an 18.84% relative improvement in GLEU for repair, a human-validated success rate of 0.6533, and a 36% speedup in inference—outperforming all state-of-the-art methods across all metrics.

Technology Category

Application Category

📝 Abstract

Comments within code serve as a crucial foundation for software documentation, facilitating developers to communicate and understand the code effectively. However, code-comment inconsistency (CCI) can negatively affect software development, testing, and maintenance. Recent efforts to mitigate this issue have emerged, but existing studies often suffer from inaccurate datasets and inadequate solutions, weakening their practical effectiveness. In this study, we first conduct a quantitative analysis of existing datasets, revealing a substantial portion of sampled data are mislabeled. To address these data limitations, we introduce CCIBench, a refined dataset comprising high-quality data, to support the training and evaluation of method-level CCI methods. Furthermore, we present an innovative end-to-end LLM-based framework, CCISolver, designed to improve code quality by identifying and rectifying CCIs. Comprehensive evaluations demonstrate CCISolver's superior performance. For detection, it establishes a new state-of-the-art with an F1-score of 89.54%. In fixing task, it achieves a remarkable 18.84% relative improvement in GLEU score over the strongest baseline. This superiority is confirmed by human evaluation, where CCISolver's fixing success rate of 0.6533 significantly surpasses existing methods. Critically, in a practical end-to-end setting, CCISolver's innovative architecture is approximately 36% faster for inference than the baseline model, underscoring its scalability and real-world applicability.

Problem

Research questions and friction points this paper is trying to address.

Detect and repair method-level code-comment inconsistency

Address inaccuracies in existing CCI datasets

Improve code quality via LLM-based end-to-end framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces CCIBench for high-quality dataset

Proposes end-to-end LLM-based CCISolver framework

Achieves 36% faster inference than baseline

🔎 Similar Papers

No similar papers found.