PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing AI assistants struggle to effectively comprehend users’ private data (e.g., conversation histories, application usage patterns), yet the sensitivity of such data has precluded the development of public benchmarks for evaluating personal information understanding. Method: We introduce the first synthetic benchmark dedicated to private-data understanding—featuring a scalable, LLM-driven synthesis framework that generates high-fidelity user profiles and personalized documents, and a privacy-compliant RAG evaluation suite focused on “private information understanding,” covering biographical facts, preferences, and social relationships across multiple granularities. Contribution/Results: Experiments reveal that state-of-the-art RAG models achieve <42% accuracy on realistic private-question-answering tasks, exposing critical failures in identifying and reasoning over essential personal information. Our benchmark fills a key evaluation gap under data-sensitivity constraints and provides a foundational diagnostic tool for building trustworthy, personalized AI systems.

Technology Category

Application Category

📝 Abstract

Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users. A key scenario in this domain involves enabling AI models to access and interpret a user's private data (e.g., conversation history, user-AI interactions, app usage) to understand personal details such as biographical information, preferences, and social connections. However, due to the sensitive nature of such data, there are no publicly available datasets that allow us to assess an AI model's ability to understand users through direct access to personal information. To address this gap, we introduce a synthetic data generation pipeline that creates diverse, realistic user profiles and private documents simulating human activities. Leveraging this synthetic data, we present PersonaBench, a benchmark designed to evaluate AI models' performance in understanding personal information derived from simulated private user data. We evaluate Retrieval-Augmented Generation (RAG) pipelines using questions directly related to a user's personal information, supported by the relevant private documents provided to the models. Our results reveal that current retrieval-augmented AI models struggle to answer private questions by extracting personal information from user documents, highlighting the need for improved methodologies to enhance personalization capabilities in AI.

Problem

Research questions and friction points this paper is trying to address.

Evaluating AI models on understanding personal information through private data.

Addressing lack of public datasets for assessing AI personalization capabilities.

Developing synthetic data to benchmark AI performance in personalization tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic data generation for user profiles

PersonaBench benchmark for AI evaluation

Retrieval-Augmented Generation (RAG) pipelines

🔎 Similar Papers

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding