Can Open-Source LLM Agents Replace Static Application Security Testing Tools? An Empirical Assessment

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study presents the first empirical evaluation of open-source large language model (LLM) agents in static application security testing (SAST) tasks. The authors construct three LLM-based agents using Ollama and benchmark their performance against Bandit, a mature SAST tool, in standard vulnerability detection scenarios. A quantitative comparison is conducted using a composite scoring metric based on precision, recall, and the number of false positives. The results demonstrate that current open-source LLM agents exhibit substantially lower overall performance than specialized SAST tools and are not yet viable replacements in real-world SAST applications. This work provides critical empirical evidence delineating the current applicability boundaries of LLMs in cybersecurity contexts.
📝 Abstract
This paper explores the value of agentic AI tools for cybersecurity purposes. We evaluate the efficacy of a general-purpose GenAI Large Language Model- (GenAI-) based agent when powered by three different Ollama-hosted general-purpose open source models. We assess each agent's performance using precision, recall, false positive count, and a calculated composite score based upon the interplay of the captured metrics, against the baseline performance of an existing, vetted Static Application Security Testing (SAST) tool, Bandit. Our findings refute the notion that a modern open-source GenAI LLM-based agent is currently suitable for the specialized task of SAST scanning under realistic conditions.
Problem

Research questions and friction points this paper is trying to address.

LLM agents
Static Application Security Testing
SAST
open-source models
cybersecurity
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agents
Static Application Security Testing
open-source LLMs
empirical evaluation
cybersecurity
D
Derek Yohn
College of Engineering and Science, Florida Institute of Technology, Melbourne, Florida, USA
L
Luke Flancher
College of Engineering and Science, Florida Institute of Technology, Melbourne, Florida, USA
Mirajul Islam
Mirajul Islam
PhD in Statistics, University Florida
Bayesian StatisticsJoint modellingBiostatisticsPublic health
Khaled Slhoub
Khaled Slhoub
Florida Institute of Technology
Software EngineeringAgent-based SystemsSoftware TestingSoftware BotsLLMs