PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

This work addresses the critical lack of real-world, production-grade defect data for AI-driven automated testing. To this end, we introduce PyResBugs—the first natural-language-annotated dataset targeting residual bugs in Python: those that evade conventional testing and manifest only in production environments. Methodologically, we systematically collect defect pairs (faulty/patched versions) from mainstream Python frameworks, ensuring rigorous human annotation, version alignment, and multi-level validation. Each defect is accompanied by fine-grained natural language descriptions covering root cause, triggering conditions, and observable exception behavior. Our core contribution is the first precise mapping from natural language specifications to executable faults, thereby bridging the long-standing gap between NL-driven fault injection and production-representative defects. Empirical evaluation demonstrates that PyResBugs significantly enhances the generalizability and practical utility of AI-based testing tools in both real-defect detection and controllable fault injection tasks.

Technology Category

Application Category

📝 Abstract

This paper presents PyResBugs, a curated dataset of residual bugs, i.e., defects that persist undetected during traditional testing but later surface in production, collected from major Python frameworks. Each bug in the dataset is paired with its corresponding fault-free (fixed) version and annotated with multi-level natural language (NL) descriptions. These NL descriptions enable natural language-driven fault injection, offering a novel approach to simulating real-world faults in software systems. By bridging the gap between software fault injection techniques and real-world representativeness, PyResBugs provides researchers with a high-quality resource for advancing AI-driven automated testing in Python systems.

Problem

Research questions and friction points this paper is trying to address.

Dataset of residual Python bugs undetected in testing

Enables natural language-driven fault injection

Advances AI-driven automated testing in Python

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated dataset of residual Python bugs

Multi-level natural language descriptions

Natural language-driven fault injection

🔎 Similar Papers

WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs