🤖 AI Summary
This study addresses the vulnerability of sensitive information leakage in prompts when large language models (LLMs) invoke cloud-based APIs, a risk inadequately mitigated by existing privacy tools. The work presents the first systematic evaluation of eight privacy-preserving techniques—local inference, input sanitization, semantic rewriting, trusted execution environments, split inference, fully homomorphic encryption, secure multi-party computation, and differential privacy—within a unified framework. An open-source middleware compatible with both MCP and OpenAI APIs is developed to empirically compare these methods. Guided by threat models and workload characteristics, the authors formulate composition rules for selecting optimal strategy combinations. The chosen hybrid approach achieves zero exact PII leakage across 500 samples, with an aggregate leakage rate of merely 0.6%, substantially outperforming any single technique. The complete toolchain and a curated benchmark dataset are publicly released.
📝 Abstract
Coding agents and LLM-powered applications routinely send potentially sensitive content to cloud LLM APIs where it may be logged, retained, used for training, or subpoenaed. Existing privacy tooling focuses on network-level encryption and organization-level DLP, neither of which addresses the content of prompts themselves. We present a systematic empirical evaluation of eight techniques for privacy-preserving LLM requests: (A) local-only inference, (B) redaction with placeholder restoration, (C) semantic rephrasing, (D) Trusted Execution Environment hosted inference, (E) split inference, (F) fully homomorphic encryption, (G) secret sharing via multi-party computation, and (H) differential-privacy noise. We implement all eight (or a tractable research-stage subset where deployment is not yet feasible) in an open-source shim compatible with MCP and any OpenAI-compatible API. We evaluate the four practical options (A, B, C, H) and their combinations across four workload classes using a ground-truth-labelled leak benchmark of 1,300 samples with 4,014 annotations. Our headline finding is that no single technique dominates: the combination A+B+C (route locally when possible, redact and rephrase the rest) achieves 0.6% combined leak on PII and 31.3% on proprietary code, with zero exact leaks on PII across 500 samples. We present a decision rule that selects the appropriate option(s) from a threat-model budget and workload characterisation. Code, benchmarks, and evaluation harness are released at https://github.com/jayluxferro/llm-redactor.