CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a timing side-channel vulnerability introduced by Automatic Prefix Caching (APC) in multi-tenant large language model services, where attackers can infer sensitive requests from other users through cache hit/miss patterns. To mitigate this risk without sacrificing performance, the paper proposes a selective isolation mechanism that dynamically identifies high-risk shared prefixes via lightweight monitoring of cross-tenant prefix reuse and isolates only those deemed suspicious. This approach effectively defends against APC-based side-channel attacks while preserving cache sharing for benign cases. Experimental results demonstrate that, compared to a fully isolated baseline, the proposed method improves cache reuse by up to 70% and reduces inference latency by 30%, achieving a strong balance between security and system efficiency.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) rely on optimizations like Automatic Prefix Caching (APC) to accelerate inference. APC works by reusing previously computed states for the beginning part of a request (prefix), when another request starts with the same text. While APC improves throughput, it introduces timing side channels: cache hits are faster than misses, creating observable latency differences. In multi-tenant systems, attackers can exploit these differences to infer sensitive information, e.g., by incrementally reconstructing another user's request by observing hit/miss patterns. Current defenses take a sledgehammer approach: they disable APC and cache sharing, isolating users, and sacrificing efficiency for regular users. This paper presents CacheSolidarity, a system that secures multi-tenant LLM serving systems against APC side channels without sacrificing performance and efficiency. CacheSolidarity monitors cache reuse across users, flags suspicious sharing, and selectively isolates prefixes, restricting their reuse only when necessary. Evaluation shows that CacheSolidarity enables up to 70% higher cache reuse and 30% lower inference latency compared to existing defenses that isolate users. CacheSolidarity's lightweight design demonstrates how security in LLM serving does not have to come at the cost of unnecessarily reduced performance or unbearable overheads.
Problem

Research questions and friction points this paper is trying to address.

prefix caching
side channels
multi-tenant LLM serving
cache timing attacks
Automatic Prefix Caching
Innovation

Methods, ideas, or system contributions that make the work stand out.

prefix caching
side-channel defense
multi-tenant LLM serving
cache isolation
timing side channels
🔎 Similar Papers
No similar papers found.
P
Panagiotis Georgios Pennas
IMDEA Software Institute, Universidad Politécnica de Madrid
K
Konstantinos Papaioannou
IMDEA Software Institute, Universidad Politécnica de Madrid
Marco Guarnieri
Marco Guarnieri
Associate Research Professor, IMDEA Software Institute
Computer SecurityVerificationProgramming languages
Thaleia Dimitra Doudali
Thaleia Dimitra Doudali
IMDEA Software Institute