Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical issue that code generated by large language models (LLMs) often exhibits predictable security vulnerabilities due to repetitive templating, and proposes a systematic approach for analyzing vulnerability persistence in black-box settings. The authors introduce the Feature–Security Table (FSTab) framework, which predicts backend vulnerabilities based on frontend features and model metadata without requiring access to source code. FSTab enables black-box attacks and model-centric security evaluation, and it is the first to reveal consistent vulnerability reproduction across programs, semantic rewrites, and diverse application domains. Experimental results on state-of-the-art models—including GPT-5.2, Claude-4.5 Opus, and Gemini-3 Pro—demonstrate a 94% attack success rate and 93% vulnerability coverage, even when the target domain was excluded from training data.

Technology Category

Application Category

📝 Abstract
LLMs are increasingly used for code generation, but their outputs often follow recurring templates that can induce predictable vulnerabilities. We study \emph{vulnerability persistence} in LLM-generated software and introduce \emph{Feature--Security Table (FSTab)} with two components. First, FSTab enables a black-box attack that predicts likely backend vulnerabilities from observable frontend features and knowledge of the source LLM, without access to backend code or source code. Second, FSTab provides a model-centric evaluation that quantifies how consistently a given model reproduces the same vulnerabilities across programs, semantics-preserving rephrasings, and application domains. We evaluate FSTab on state-of-the-art code LLMs, including GPT-5.2, Claude-4.5 Opus, and Gemini-3 Pro, across diverse application domains. Our results show strong cross-domain transfer: even when the target domain is excluded from training, FSTab achieves up to 94\% attack success and 93\% vulnerability coverage on Internal Tools (Claude-4.5 Opus). These findings expose an underexplored attack surface in LLM-generated software and highlight the security risks of code generation. Our code is available at: https://anonymous.4open.science/r/FSTab-024E.
Problem

Research questions and friction points this paper is trying to address.

vulnerability persistence
LLM-generated software
recurring vulnerabilities
black-box attack
code generation security
Innovation

Methods, ideas, or system contributions that make the work stand out.

vulnerability persistence
Feature–Security Table
black-box attack
LLM-generated software
model-centric evaluation
🔎 Similar Papers
No similar papers found.
T
Tomer Kordonsky
Technion – Israel Institute of Technology
M
Maayan Yamin
Technion – Israel Institute of Technology
N
Noam Benzimra
Technion – Israel Institute of Technology
Amit Levi
Amit Levi
University of Haifa
Theoretical Computer ScienceAlgorithmsMachine learning
Avi Mendelson
Avi Mendelson
Electrical Engineering and Computer Science, Technion,
Computer systems