🤖 AI Summary
Unintentional privacy leakage under users’ real identities during LLM interactions remains an understudied challenge in on-device dialogue systems.
Method: We propose the first on-device privacy detection framework tailored for conversational scenarios. Our approach includes: (i) an automated pipeline for privacy phrase extraction and fine-grained type annotation, leveraging cloud-based strong models to generate high-quality multilingual labels; (ii) a three-tier evaluation metric covering leakage detection, phrase localization, and information categorization; and (iii) zero-shot and fine-tuned baselines optimized for lightweight local LLMs (e.g., Phi-3, Qwen2).
Contribution/Results: We release the first large-scale multilingual privacy phrase annotation dataset (249K queries, 154K fine-grained annotated phrases). Empirical evaluation reveals substantial performance gaps—particularly in phrase recall and fine-grained classification—between current lightweight models and practical deployment requirements, establishing a foundational benchmark and technical roadmap for on-device privacy protection.
📝 Abstract
Users interacting with large language models (LLMs) under their real identifiers often unknowingly risk disclosing private information. Automatically notifying users whether their queries leak privacy and which phrases leak what private information has therefore become a practical need. Existing privacy detection methods, however, were designed for different objectives and application scenarios, typically tagging personally identifiable information (PII) in anonymous content. In this work, to support the development and evaluation of privacy detection models for LLM interactions that are deployable on local user devices, we construct a large-scale multilingual dataset with 249K user queries and 154K annotated privacy phrases. In particular, we build an automated privacy annotation pipeline with cloud-based strong LLMs to automatically extract privacy phrases from dialogue datasets and annotate leaked information. We also design evaluation metrics at the levels of privacy leakage, extracted privacy phrase, and privacy information. We further establish baseline methods using light-weight LLMs with both tuning-free and tuning-based methods, and report a comprehensive evaluation of their performance. Evaluation results reveal a gap between current performance and the requirements of real-world LLM applications, motivating future research into more effective local privacy detection methods grounded in our dataset.