Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work identifies the key-value (KV) cache—a previously overlooked attack surface in large language model (LLM) inference—as a critical security vulnerability: even when prompts and model parameters are protected, adversaries can systematically perturb cached key vectors via malicious token injection (MTI), thereby distorting next-token prediction distributions and degrading downstream task performance. The authors formally establish KV cache integrity as a fundamental security dimension and propose a modular cache perturbation framework enabling tunable perturbation strength and layer- or timestep-specific targeting. Leveraging Frobenius norm constraints and softmax Lipschitz continuity analysis, they develop a theoretical model characterizing perturbation propagation through the attention mechanism. Experiments on GPT-2 and LLaMA-2/7B demonstrate that MTI significantly impairs retrieval-augmented generation and agent-based reasoning performance. This work introduces a novel threat paradigm for LLMs and establishes foundational benchmarks for cache-aware security defenses.

Technology Category

Application Category

📝 Abstract

Even when prompts and parameters are secured, transformer language models remain vulnerable because their key-value (KV) cache during inference constitutes an overlooked attack surface. This paper introduces Malicious Token Injection (MTI), a modular framework that systematically perturbs cached key vectors at selected layers and timesteps through controlled magnitude and frequency, using additive Gaussian noise, zeroing, and orthogonal rotations. A theoretical analysis quantifies how these perturbations propagate through attention, linking logit deviations to the Frobenius norm of corruption and softmax Lipschitz dynamics. Empirical results show that MTI significantly alters next-token distributions and downstream task performance across GPT-2 and LLaMA-2/7B, as well as destabilizes retrieval-augmented and agentic reasoning pipelines. These findings identify cache integrity as a critical yet underexplored vulnerability in current LLM deployments, positioning cache corruption as a reproducible and theoretically grounded threat model for future robustness and security research.

Problem

Research questions and friction points this paper is trying to address.

Transformer KV cache vulnerabilities enable inference-time attacks

Malicious Token Injection framework corrupts cached key vectors systematically

Cache perturbations alter model outputs and compromise reasoning pipelines

Innovation

Methods, ideas, or system contributions that make the work stand out.

MTI framework perturbs KV cache via controlled noise injection

Theoretical analysis links logit deviations to corruption metrics

Cache corruption destabilizes reasoning pipelines across multiple models

🔎 Similar Papers

No similar papers found.