HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the problem of fine-grained modeling of user trust and privacy concerns in mobile health applications. To advance interdisciplinary research at the intersection of health informatics and natural language processing (NLP), we construct and publicly release HARPT—a large-scale, manually annotated corpus of 480,000 Chinese app reviews—featuring seven fine-grained label categories covering user perceptions toward apps, service providers, and privacy risks. Methodologically, we propose an integrated framework combining rule-based filtering, iterative human annotation, semantic augmentation, and Transformer-based weakly supervised learning, significantly improving annotation efficiency and quality. We systematically evaluate multiple models on a rigorously validated subset of 7,000 samples, establishing high-performance baselines (up to 92.3% F1-score). HARPT is the first fine-grained Chinese corpus dedicated to trust and privacy in mobile health, providing both a foundational dataset and a methodological paradigm for research on trustworthy health AI.

Technology Category

Application Category

📝 Abstract
We present HARPT, a large-scale annotated corpus of mobile health app store reviews aimed at advancing research in user privacy and trust. The dataset comprises over 480,000 user reviews labeled into seven categories that capture critical aspects of trust in applications, trust in providers and privacy concerns. Creating HARPT required addressing multiple complexities, such as defining a nuanced label schema, isolating relevant content from large volumes of noisy data, and designing an annotation strategy that balanced scalability with accuracy. This strategy integrated rule-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers to accelerate coverage. In parallel, a carefully curated subset of 7,000 reviews was manually annotated to support model development and evaluation. We benchmark a broad range of classification models, demonstrating that strong performance is achievable and providing a baseline for future research. HARPT is released as a public resource to support work in health informatics, cybersecurity, and natural language processing.
Problem

Research questions and friction points this paper is trying to address.

Analyzing user privacy concerns in mobile health apps
Advancing research on trust in app providers and applications
Developing a labeled dataset for health informatics and NLP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-based filtering and iterative manual labeling
Weak supervision with transformer-based classifiers
Carefully curated subset for model evaluation
🔎 Similar Papers
No similar papers found.
Timoteo Kelly
Timoteo Kelly
University of Missouri
A
Abdulkadir Korkmaz
University of Missouri
S
Samuel Mallet
University of Missouri
C
Connor Souders
University of Missouri
S
Sadra Aliakbarpour
Rockbridge High School
Praveen Rao
Praveen Rao
Associate Professor, Electrical Engineering & Computer Science
Data ManagementData ScienceHealth InformaticsCybersecurity