CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights

๐Ÿ“… 2025-11-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work introduces a stealthy backdoor attack targeting the inference phase of large language models (LLMs), injecting trojan behavior by perturbing key-value (KV) cache entriesโ€”without altering inputs or model weights. Methodologically, it identifies, for the first time, that flipping a single bit in specific vulnerable positions of the KV cache reliably triggers malicious behavior; it further proposes a data-free, gradient-free algorithm to dynamically locate and perturb critical activation vectors within the cache. Contributions include: (1) demonstrating high attack success rates across diverse LLMs (e.g., LLaMA-2, Vicuna) and downstream tasks (e.g., question answering, summarization); (2) achieving strong transferability across tasks and datasets; and (3) preserving original model performance with negligible accuracy degradation. Experiments confirm both efficacy and stealthiness, revealing a previously overlooked attack surface in LLM inference and providing empirical evidence for rethinking inference-time security.

Technology Category

Application Category

๐Ÿ“ Abstract
Adversarial weight perturbation has emerged as a concerning threat to LLMs that either use training privileges or system-level access to inject adversarial corruption in model weights. With the emergence of innovative defensive solutions that place system- and algorithm-level checks and corrections in the input and weight spaces, these perturbations are increasingly susceptible to defenses. This work develops a novel perspective on Trojan attacks that generates an attacker-designed model output while leaving no attack traces on the inputs or weights. Such an attack space can be unlocked through corruption of the key-value (KV) cache. In this paper, we introduce CacheTrap, a novel Trojan attack that corrupts the value vectors stored in the KV cache. These vectors capture the dynamic activations for specific token positions and therefore constitute a natural surface for transient, inference-time trigger insertion. The transient nature of these KV values and their dependence on victim input imply additional constraints on our attack, such as a lack of knowledge of the victim's data or domain application, and, consequently, a lack of gradient information. The objective of the proposed CacheTrap is to develop a vulnerable KV bit-searching algorithm so that, once the attack employs the identified bit-flip as a trigger, the model generates targeted behavior, e.g., classifying inputs towards the target class. Moreover, CacheTrap is a data- and gradient-free attack which also has no impact on the model's utility. Our evaluation demonstrates that the proposed attack enables the first successful Trojan attack on LLMs with a single bit flip in the KV cache. In addition, the data-independent nature of the attack ensures that once the attacker identifies the vulnerable bit index, the location remains constant and can be transferred to a wide range of victim tasks/datasets/queries with no overhead.
Problem

Research questions and friction points this paper is trying to address.

Introduces CacheTrap, a Trojan attack corrupting KV cache without input/weight traces
Develops a data- and gradient-free method to flip bits in KV cache for targeted outputs
Enables transient, inference-time attacks transferable across tasks without affecting model utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Corrupts value vectors in KV cache
Uses bit-flip triggers without data knowledge
Ensures attack leaves no input or weight traces
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Mohaiminul Al Nahian
SUNY Binghamton
A
Abeer Matar A. Almalky
SUNY Binghamton
G
Gamana Aragonda
New Jersey Institute of Technology
R
Ranyang Zhou
New Jersey Institute of Technology
Sabbir Ahmed
Sabbir Ahmed
Islamic University of Technology
Computer VisionDeep Learning
Dmitry Ponomarev
Dmitry Ponomarev
SUNY Binghamton
L
Li Yang
UNC Charlotte
Shaahin Angizi
Shaahin Angizi
Assistant Professor at New Jersey Institute of Technology
In-Memory ComputingIn-Sensor ComputingMemory SecurityAIDigital Design
A
Adnan Siraj Rakin
SUNY Binghamton