The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a novel privacy threat—“image-based private attribute profiling”—emerging from vision-language model (VLM) agents: inferring sensitive attributes (e.g., age, health) and abstract traits (e.g., personality, sociability) from just a few personal images. To address two key challenges—the absence of large-scale, multi-image benchmarks with fine-grained privacy attribute annotations, and the limited cross-image abstract reasoning capability of current multimodal large language models (MLLMs)—we introduce PAPI, the first large-scale, multi-image dataset annotated for private attributes. We further propose HolmesEye, a hybrid agent framework that uniquely integrates multi-image joint representation learning, long-context visual reasoning, and LLM-guided forensic-style cross-image relational modeling. Experiments demonstrate that HolmesEye achieves a 10.8% average accuracy gain over state-of-the-art methods on private attribute profiling and surpasses human-level performance by 15.0% on abstract trait prediction.

Technology Category

Application Category

📝 Abstract
Our research reveals a new privacy risk associated with the vision-language model (VLM) agentic framework: the ability to infer sensitive attributes (e.g., age and health information) and even abstract ones (e.g., personality and social traits) from a set of personal images, which we term"image private attribute profiling."This threat is particularly severe given that modern apps can easily access users' photo albums, and inference from image sets enables models to exploit inter-image relations for more sophisticated profiling. However, two main challenges hinder our understanding of how well VLMs can profile an individual from a few personal photos: (1) the lack of benchmark datasets with multi-image annotations for private attributes, and (2) the limited ability of current multimodal large language models (MLLMs) to infer abstract attributes from large image collections. In this work, we construct PAPI, the largest dataset for studying private attribute profiling in personal images, comprising 2,510 images from 251 individuals with 3,012 annotated privacy attributes. We also propose HolmesEye, a hybrid agentic framework that combines VLMs and LLMs to enhance privacy inference. HolmesEye uses VLMs to extract both intra-image and inter-image information and LLMs to guide the inference process as well as consolidate the results through forensic analysis, overcoming existing limitations in long-context visual reasoning. Experiments reveal that HolmesEye achieves a 10.8% improvement in average accuracy over state-of-the-art baselines and surpasses human-level performance by 15.0% in predicting abstract attributes. This work highlights the urgency of addressing privacy risks in image-based profiling and offers both a new dataset and an advanced framework to guide future research in this area.
Problem

Research questions and friction points this paper is trying to address.

Detecting privacy risks in vision-language models profiling users
Overcoming lack of datasets for multi-image private attributes
Improving abstract attribute inference from image collections
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs PAPI dataset for private attribute profiling
Proposes HolmesEye hybrid VLM-LLM agentic framework
Enhances accuracy in abstract attribute prediction
🔎 Similar Papers
No similar papers found.
F
Feiran Liu
Nanyang Technological University
Y
Yuzhe Zhang
Beijing University of Technology
X
Xinyi Huang
Beijing University of Technology
Y
Yinan Peng
Hengxin Tech
X
Xinfeng Li
Nanyang Technological University
Lixu Wang
Lixu Wang
Northwestern University
Machine LearningData Privacy
Y
Yutong Shen
Beijing University of Technology
Ranjie Duan
Ranjie Duan
Alibaba Group
AIAI 安全AI推动共同富裕
S
Simeng Qin
Nanyang Technological University
Xiaojun Jia
Xiaojun Jia
Nanyang Technological University
Explainable AIRobust AIEfficient AI
Q
Qingsong Wen
Squirrel Ai Learning
W
Wei Dong
Nanyang Technological University