🤖 AI Summary
Vulnerability detection in real-world Rust code faces challenges including low detection accuracy, scarcity of labeled data, and poor generalization of existing tools. Method: This paper proposes a novel weakly supervised vulnerability detection paradigm leveraging Large Language Model (LLM) hallucination: by explicitly assuming the presence of vulnerabilities, the LLM is prompted to generate “hallucinated” vulnerability reports, which serve as implicit supervision signals for discriminative training—eliminating the need for ground-truth labels. The approach integrates vulnerability-guided prompt engineering, Rust syntax-aware code representation, and contrastive learning. Contribution/Results: Evaluated on 81 real Rust vulnerabilities (447 functions, 18,691 lines), our method achieves an F1-score of 77.3%, surpassing the state-of-the-art by over 10%. Fine-tuning with hallucinated reports improves performance by 20% over conventional code fine-tuning and demonstrates strong cross-language transferability.
📝 Abstract
As an emerging programming language, Rust has rapidly gained popularity and recognition among developers due to its strong emphasis on safety. It employs a unique ownership system and safe concurrency practices to ensure robust safety. Despite these safeguards, security in Rust still presents challenges. Since 2018, 442 Rust-related vulnerabilities have been reported in real-world applications. The limited availability of data has resulted in existing vulnerability detection tools performing poorly in real-world scenarios, often failing to adapt to new and complex vulnerabilities. This paper introduces HALURust, a novel framework that leverages hallucinations of large language models (LLMs) to detect vulnerabilities in real-world Rust scenarios. HALURust leverages LLMs' strength in natural language generation by transforming code into detailed vulnerability analysis reports. The key innovation lies in prompting the LLM to always assume the presence of a vulnerability. If the code sample is vulnerable, the LLM provides an accurate analysis; if not, it generates a hallucinated report. By fine-tuning LLMs on these hallucinations, HALURust can effectively distinguish between vulnerable and non-vulnerable code samples. HALURust was evaluated on a dataset of 81 real-world vulnerabilities, covering 447 functions and 18,691 lines of code across 54 applications. It outperformed existing methods, achieving an F1 score of 77.3%, with over 10% improvement. The hallucinated report-based fine-tuning improved detection by 20% compared to traditional code-based fine-tuning. Additionally, HALURust effectively adapted to unseen vulnerabilities and other programming languages, demonstrating strong generalization capabilities.