Repairing Regex Vulnerabilities via Localization-Guided Instructions

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing precision and generalization in automated repair of Regular Expression Denial-of-Service (ReDoS) vulnerabilities, this paper proposes a localization-guided hybrid repair framework. The framework decouples vulnerability localization from patch generation: it first employs symbolic execution to precisely identify vulnerable subpatterns; then guides a large language model (LLM) to synthesize semantically equivalent repairs within localized contexts, enforced by semantic equivalence constraints and context isolation mechanisms to ensure correctness. By integrating the reliability of deterministic analysis with the generalization capability of LLMs, the approach significantly improves repair efficacy—especially for complex and previously unseen patterns. Experimental evaluation demonstrates a 15.4-percentage-point increase in repair success rate over the state-of-the-art, validating the effectiveness and practicality of the hybrid architecture for automated program repair.

Technology Category

Application Category

📝 Abstract
Regular expressions (regexes) are foundational to modern computing for critical tasks like input validation and data parsing, yet their ubiquity exposes systems to regular expression denial of service (ReDoS), a vulnerability requiring automated repair methods. Current approaches, however, are hampered by a trade-off. Symbolic, rule-based system are precise but fails to repair unseen or complex vulnerability patterns. Conversely, large language models (LLMs) possess the necessary generalizability but are unreliable for tasks demanding strict syntactic and semantic correctness. We resolve this impasse by introducing a hybrid framework, localized regex repair (LRR), designed to harness LLM generalization while enforcing reliability. Our core insight is to decouple problem identification from the repair process. First, a deterministic, symbolic module localizes the precise vulnerable subpattern, creating a constrained and tractable problem space. Then, the LLM invoked to generate a semantically equivalent fix for this isolated segment. This combined architecture successfully resolves complex repair cases intractable for rule-based repair while avoiding the semantic errors of LLM-only approaches. Our work provides a validated methodology for solving such problems in automated repair, improving the repair rate by 15.4%p over the state-of-the-art. Our code is available at https://github.com/cdltlehf/LRR.
Problem

Research questions and friction points this paper is trying to address.

Automated repair of regex vulnerabilities to prevent ReDoS attacks
Overcoming limitations of rule-based systems and LLMs in regex repair
Localizing vulnerable subpatterns to guide precise semantic-preserving fixes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework combining symbolic analysis with LLMs
Localizes vulnerable subpatterns before generating repairs
Generates semantically equivalent fixes for isolated segments