🤖 AI Summary
To address the trade-off between user privacy protection and response quality in language model applications, this paper proposes a privacy-aware delegation framework. It orchestrates API-based and local open-source LLMs via a multi-stage LLM pipeline integrating PII detection, dynamic de-identification, and prompt optimization—enabling query-level privacy-controllable responses. Key contributions include: (1) the first chain-style delegation paradigm that jointly optimizes privacy preservation and response quality; (2) PUPA—the first privacy leakage evaluation benchmark grounded in real-world user–LLM interactions, featuring fine-grained PII annotations; and (3) state-of-the-art performance on PUPA, achieving an 85.5% high-quality response rate with only a 7.5% privacy leakage rate. All data and code are publicly released.
📝 Abstract
Users can divulge sensitive information to proprietary LLM providers, raising significant privacy concerns. While open-source models, hosted locally on the user's machine, alleviate some concerns, models that users can host locally are often less capable than proprietary frontier models. Toward preserving user privacy while retaining the best quality, we propose Privacy-Conscious Delegation, a novel task for chaining API-based and local models. We utilize recent public collections of user-LLM interactions to construct a natural benchmark called PUPA, which contains personally identifiable information (PII). To study potential approaches, we devise PAPILLON, a multi-stage LLM pipeline that uses prompt optimization to address a simpler version of our task. Our best pipeline maintains high response quality for 85.5% of user queries while restricting privacy leakage to only 7.5%. We still leave a large margin to the generation quality of proprietary LLMs for future work. Our data and code will be available at https://github.com/siyan-sylvia-li/PAPILLON.