WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs

📅 2024-05-02
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses semantic-level Entity Inconsistency Bugs (EIBs)—syntactically valid but semantically erroneous code tokens, such as misnamed variables or functions. We propose WitheredLeaf, a cascaded detection framework that pioneers a lightweight, code-specific model for efficient negative-sample pre-filtering, integrated with context-aware prompt engineering and a multi-stage filtering mechanism. By synergistically leveraging large language models (e.g., GPT-4) and compact code-specialized models, WitheredLeaf significantly improves both precision and recall in EIB detection. Evaluated across 154 high-star Python/C open-source repositories, it identified 123 previously unknown EIBs—45% of which induce functional anomalies—and contributed 69 patches, with 27 already merged. To our knowledge, this is the first systematic modeling and scalable detection of EIBs, establishing a novel, extensible paradigm for semantic-level code defect identification.

Technology Category

Application Category

📝 Abstract
Originating from semantic bugs, Entity-Inconsistency Bugs (EIBs) involve misuse of syntactically valid yet incorrect program entities, such as variable identifiers and function names, which often have security implications. Unlike straightforward syntactic vulnerabilities, EIBs are subtle and can remain undetected for years. Traditional detection methods, such as static analysis and dynamic testing, often fall short due to the versatile and context-dependent nature of EIBs. However, with advancements in Large Language Models (LLMs) like GPT-4, we believe LLM-powered automatic EIB detection becomes increasingly feasible through these models' semantics understanding abilities. This research first undertakes a systematic measurement of LLMs' capabilities in detecting EIBs, revealing that GPT-4, while promising, shows limited recall and precision that hinder its practical application. The primary problem lies in the model's tendency to focus on irrelevant code snippets devoid of EIBs. To address this, we introduce a novel, cascaded EIB detection system named WitheredLeaf, which leverages smaller, code-specific language models to filter out most negative cases and mitigate the problem, thereby significantly enhancing the overall precision and recall. We evaluated WitheredLeaf on 154 Python and C GitHub repositories, each with over 1,000 stars, identifying 123 new flaws, 45% of which can be exploited to disrupt the program's normal operations. Out of 69 submitted fixes, 27 have been successfully merged.
Problem

Research questions and friction points this paper is trying to address.

Detecting subtle token-inconsistency bugs in code using LLMs
Overcoming limitations of GPT-4 in precision and scalability
Developing cascaded system to filter code and find exploitable flaws
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses cascaded system with smaller code-specific models
Filters non-TIB snippets to improve precision
Enhances scalability by reducing GPT-4 usage
🔎 Similar Papers
No similar papers found.
H
Hongbo Chen
Indiana University Bloomington
Y
Yifan Zhang
Indiana University Bloomington, Samsung Research America
X
Xing Han
Independent Researcher
H
Huanyao Rong
Indiana University Bloomington
Yuheng Zhang
Yuheng Zhang
University of Illinois Urbana-Champaign
Machine LearningReinforcement LearningOnline LearningBanditsLearning Theory
T
Tianhao Mao
Indiana University Bloomington
H
Hang Zhang
Indiana University Bloomington
XiaoFeng Wang
XiaoFeng Wang
Chair, ACM SIGSAC
AI-Centered SecuritySystems Security and PrivacyHealthcare PrivacyIncentive Engineering
Luyi Xing
Luyi Xing
Associate Professor of Computer Science, University of Illinois Urbana-Champaign/Indiana University
System SecurityData Privacy and Cybercrime
X
Xun Chen
Samsung Research America