🤖 AI Summary
OpenVAS and Tenable WAS vulnerability scan reports are unstructured and heterogeneous, hindering unified analysis and automation in vulnerability management.
Method: We propose the first large language model (LLM)-based cross-tool structuring framework, leveraging GPT-4.1 and DeepSeek with domain-adapted prompt engineering and rule-guided post-processing to extract key fields—including vulnerability description, CVSS score, and affected components—and output standardized JSON.
Contribution/Results: Evaluated on 34 real-world vulnerability reports, our approach achieves ROUGE-L scores exceeding 0.7—significantly outperforming traditional rule-based methods—and demonstrates strong generalization across diverse report formats. It enables downstream tasks such as sensitive information anonymization and risk prioritization. This work represents the first LLM-driven unified parsing solution for these two industry-standard scanners, establishing a scalable, low-maintenance paradigm for automated vulnerability governance.
📝 Abstract
This paper proposes an automated LLM-based method to extract and structure vulnerabilities from OpenVAS and Tenable WAS scanner reports, converting unstructured data into a standardized format for risk management. In an evaluation using a report with 34 vulnerabilities, GPT-4.1 and DeepSeek achieved the highest similarity to the baseline (ROUGE-L greater than 0.7). The method demonstrates feasibility in transforming complex reports into usable datasets, enabling effective prioritization and future anonymization of sensitive data.