SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This work addresses the critical challenge of preserving user prompt privacy in widely deployed public large language models without compromising utility or efficiency and without requiring model-specific modifications. The authors propose a model-agnostic privacy-preserving inference framework that elevates privacy mechanisms from the individual-prompt level to the batch level. By mixing original prompts with perturbed variants, clustering semantically equivalent instructions, and integrating batched inference with differential privacy, the method achieves strong privacy guarantees without accessing or altering model parameters. Experimental results demonstrate that the approach improves utility by over 20% compared to existing differential privacy baselines and reduces query costs by up to fivefold through a shared-prompt mechanism.
📝 Abstract
With the widespread deployment of public large language models (LLMs) such as ChatGPT, protecting user prompt privacy has become an increasingly critical issue. Existing privacy-preserving inference methods sacrifice either utility or efficiency, and often require model-specific modifications that limit their compatibility. In this paper, we propose SharedRequest, a model-agnostic framework for privacy-preserving LLM inference that reformulates privacy protection at the batch level rather than the individual-prompt level. The key idea is to obscure sensitive information by mixing original prompts with noisy variants, while grouping semantically equivalent instructions to amortize the inference cost over a large batch of queries with minimal impact on LLM response quality. This design is independent of the LLM architecture, requiring no access to model parameters or architectural modification. Empirical results demonstrate that SharedRequest achieves over $20\%$ higher utility compared to prior differential privacy baselines, and its shared-prompt mechanism reduces query cost by up to $5\times$ compared to non-batched inference.
Problem

Research questions and friction points this paper is trying to address.

privacy-preserving inference
large language models
prompt privacy
model-agnostic
differential privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy-preserving inference
model-agnostic
batch-level privacy
shared-prompt mechanism
large language models