Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current learning-based automated vulnerability repair (AVR) methods suffer from three key limitations: poor cross-repository generalization, inadequate modeling of long-range dependencies, and low robustness to syntactic perturbations such as variable renaming. To address these, we propose a semantic alignment–curriculum-guided–reasoning-enhanced repair framework that adopts a “reason-then-edit” paradigm. Our approach integrates explicit reasoning generation, semantic-aware reinforcement learning rewards, and difficulty-aware curriculum training to guide models toward learning deep repair logic rather than superficial lexical patterns. Evaluated on BigVul and PrimeVul_AVR, our method achieves 34.52% and 31.52% improvements in CodeBLEU over prior state-of-the-art methods, respectively. Ablation studies confirm the critical contributions of semantic alignment, reasoning generation, and curriculum strategy to overall performance.

Technology Category

Application Category

📝 Abstract

Current learning-based Automated Vulnerability Repair (AVR) approaches, while promising, often fail to generalize effectively in real-world scenarios. Our diagnostic analysis reveals three fundamental weaknesses in state-of-the-art AVR approaches: (1) limited cross-repository generalization, with performance drops on unseen codebases; (2) inability to capture long-range dependencies, causing a performance degradation on complex, multi-hunk repairs; and (3) over-reliance on superficial lexical patterns, leading to significant performance drops on vulnerabilities with minor syntactic variations like variable renaming. To address these limitations, we propose SeCuRepair, a semantics-aligned, curriculum-driven, and reasoning-enhanced framework for vulnerability repair. At its core, SeCuRepair adopts a reason-then-edit paradigm, requiring the model to articulate why and how a vulnerability should be fixed before generating the patch. This explicit reasoning enforces a genuine understanding of repair logic rather than superficial memorization of lexical patterns. SeCuRepair also moves beyond traditional supervised fine-tuning and employs semantics-aware reinforcement learning, rewarding patches for their syntactic and semantic alignment with the oracle patch rather than mere token overlap. Complementing this, a difficulty-aware curriculum progressively trains the model, starting with simple fixes and advancing to complex, multi-hunk coordinated edits. We evaluate SeCuRepair on strict, repository-level splits of BigVul and newly crafted PrimeVul_AVR datasets. SeCuRepair significantly outperforms all baselines, surpassing the best-performing baselines by 34.52% on BigVul and 31.52% on PrimeVul extsubscript{AVR} in terms of CodeBLEU, respectively. Comprehensive ablation studies further confirm that each component of our framework contributes to its final performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses limited cross-repository generalization in vulnerability repair

Solves inability to capture long-range dependencies for complex repairs

Overcomes over-reliance on superficial lexical patterns in patches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reason-then-edit paradigm for explicit vulnerability repair reasoning

Semantics-aware reinforcement learning for patch alignment

Difficulty-aware curriculum training for progressive complexity

🔎 Similar Papers

No similar papers found.

Authors to Follow