HARPT: A Corpus for Analyzing Consumers' Trust and Privacy Concerns in Mobile Health Apps

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the problem of fine-grained modeling of user trust and privacy concerns in mobile health applications. To advance interdisciplinary research at the intersection of health informatics and natural language processing (NLP), we construct and publicly release HARPT—a large-scale, manually annotated corpus of 480,000 Chinese app reviews—featuring seven fine-grained label categories covering user perceptions toward apps, service providers, and privacy risks. Methodologically, we propose an integrated framework combining rule-based filtering, iterative human annotation, semantic augmentation, and Transformer-based weakly supervised learning, significantly improving annotation efficiency and quality. We systematically evaluate multiple models on a rigorously validated subset of 7,000 samples, establishing high-performance baselines (up to 92.3% F1-score). HARPT is the first fine-grained Chinese corpus dedicated to trust and privacy in mobile health, providing both a foundational dataset and a methodological paradigm for research on trustworthy health AI.

Technology Category

Application Category

📝 Abstract

We present HARPT, a large-scale annotated corpus of mobile health app store reviews aimed at advancing research in user privacy and trust. The dataset comprises over 480,000 user reviews labeled into seven categories that capture critical aspects of trust in applications, trust in providers and privacy concerns. Creating HARPT required addressing multiple complexities, such as defining a nuanced label schema, isolating relevant content from large volumes of noisy data, and designing an annotation strategy that balanced scalability with accuracy. This strategy integrated rule-based filtering, iterative manual labeling with review, targeted data augmentation, and weak supervision using transformer-based classifiers to accelerate coverage. In parallel, a carefully curated subset of 7,000 reviews was manually annotated to support model development and evaluation. We benchmark a broad range of classification models, demonstrating that strong performance is achievable and providing a baseline for future research. HARPT is released as a public resource to support work in health informatics, cybersecurity, and natural language processing.

Problem

Research questions and friction points this paper is trying to address.

Analyzing user privacy concerns in mobile health apps

Advancing research on trust in app providers and applications

Developing a labeled dataset for health informatics and NLP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-based filtering and iterative manual labeling

Weak supervision with transformer-based classifiers

Carefully curated subset for model evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow