🤖 AI Summary
This work identifies the key-value (KV) cache—a previously overlooked attack surface in large language model (LLM) inference—as a critical security vulnerability: even when prompts and model parameters are protected, adversaries can systematically perturb cached key vectors via malicious token injection (MTI), thereby distorting next-token prediction distributions and degrading downstream task performance. The authors formally establish KV cache integrity as a fundamental security dimension and propose a modular cache perturbation framework enabling tunable perturbation strength and layer- or timestep-specific targeting. Leveraging Frobenius norm constraints and softmax Lipschitz continuity analysis, they develop a theoretical model characterizing perturbation propagation through the attention mechanism. Experiments on GPT-2 and LLaMA-2/7B demonstrate that MTI significantly impairs retrieval-augmented generation and agent-based reasoning performance. This work introduces a novel threat paradigm for LLMs and establishes foundational benchmarks for cache-aware security defenses.
📝 Abstract
Even when prompts and parameters are secured, transformer language models remain vulnerable because their key-value (KV) cache during inference constitutes an overlooked attack surface. This paper introduces Malicious Token Injection (MTI), a modular framework that systematically perturbs cached key vectors at selected layers and timesteps through controlled magnitude and frequency, using additive Gaussian noise, zeroing, and orthogonal rotations. A theoretical analysis quantifies how these perturbations propagate through attention, linking logit deviations to the Frobenius norm of corruption and softmax Lipschitz dynamics. Empirical results show that MTI significantly alters next-token distributions and downstream task performance across GPT-2 and LLaMA-2/7B, as well as destabilizes retrieval-augmented and agentic reasoning pipelines. These findings identify cache integrity as a critical yet underexplored vulnerability in current LLM deployments, positioning cache corruption as a reproducible and theoretically grounded threat model for future robustness and security research.