🤖 AI Summary
Existing AI assistants struggle to effectively comprehend users’ private data (e.g., conversation histories, application usage patterns), yet the sensitivity of such data has precluded the development of public benchmarks for evaluating personal information understanding. Method: We introduce the first synthetic benchmark dedicated to private-data understanding—featuring a scalable, LLM-driven synthesis framework that generates high-fidelity user profiles and personalized documents, and a privacy-compliant RAG evaluation suite focused on “private information understanding,” covering biographical facts, preferences, and social relationships across multiple granularities. Contribution/Results: Experiments reveal that state-of-the-art RAG models achieve <42% accuracy on realistic private-question-answering tasks, exposing critical failures in identifying and reasoning over essential personal information. Our benchmark fills a key evaluation gap under data-sensitivity constraints and provides a foundational diagnostic tool for building trustworthy, personalized AI systems.
📝 Abstract
Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users. A key scenario in this domain involves enabling AI models to access and interpret a user's private data (e.g., conversation history, user-AI interactions, app usage) to understand personal details such as biographical information, preferences, and social connections. However, due to the sensitive nature of such data, there are no publicly available datasets that allow us to assess an AI model's ability to understand users through direct access to personal information. To address this gap, we introduce a synthetic data generation pipeline that creates diverse, realistic user profiles and private documents simulating human activities. Leveraging this synthetic data, we present PersonaBench, a benchmark designed to evaluate AI models' performance in understanding personal information derived from simulated private user data. We evaluate Retrieval-Augmented Generation (RAG) pipelines using questions directly related to a user's personal information, supported by the relevant private documents provided to the models. Our results reveal that current retrieval-augmented AI models struggle to answer private questions by extracting personal information from user documents, highlighting the need for improved methodologies to enhance personalization capabilities in AI.