Harnessing LLMs for Document-Guided Fuzzing of OpenCV Library

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address reliability degradation in downstream applications caused by API defects in computer vision libraries such as OpenCV, this paper proposes an LLM-driven, documentation-guided fuzz testing methodology. Our approach leverages large language models to automatically parse official API documentation, precisely extract parameter constraints and cross-function dependencies, and generate high-coverage, targeted test cases. Applied to 330 core OpenCV APIs, the method autonomously uncovered 17 previously unknown vulnerabilities—10 of which have been officially confirmed by the OpenCV team and 5 fully patched. This work represents the first effort to deeply integrate LLMs into a closed-loop system for documentation understanding and fuzz testing of production-grade system libraries. It significantly advances the intelligence and effectiveness of security validation for computer vision libraries, establishing a new paradigm for automated, semantics-aware vulnerability discovery in low-level vision software.

Technology Category

Application Category

📝 Abstract

The combination of computer vision and artificial intelligence is fundamentally transforming a broad spectrum of industries by enabling machines to interpret and act upon visual data with high levels of accuracy. As the biggest and by far the most popular open-source computer vision library, OpenCV library provides an extensive suite of programming functions supporting real-time computer vision. Bugs in the OpenCV library can affect the downstream computer vision applications, and it is critical to ensure the reliability of the OpenCV library. This paper introduces VISTAFUZZ, a novel technique for harnessing large language models (LLMs) for document-guided fuzzing of the OpenCV library. VISTAFUZZ utilizes LLMs to parse API documentation and obtain standardized API information. Based on this standardized information, VISTAFUZZ extracts constraints on individual input parameters and dependencies between these. Using these constraints and dependencies, VISTAFUZZ then generates new input values to systematically test each target API. We evaluate the effectiveness of VISTAFUZZ in testing 330 APIs in the OpenCV library, and the results show that VISTAFUZZ detected 17 new bugs, where 10 bugs have been confirmed, and 5 of these have been fixed.

Problem

Research questions and friction points this paper is trying to address.

Detecting bugs in OpenCV library using LLMs

Generating test inputs from API documentation constraints

Improving reliability of computer vision applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs parse API documentation for fuzzing

Extracts parameter constraints and dependencies

Generates inputs to systematically test APIs

🔎 Similar Papers

On the Challenges of Fuzzing Techniques via Large Language Models