Datasets for Navigating Sensitive Topics in Recommendation Systems

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Personalized AI systems—e.g., recommender systems—may harm user well-being by propagating sensitive or harmful content, yet current evaluations rely predominantly on engagement metrics and lack benchmark datasets with fine-grained sensitivity annotations. Method: We propose the first fine-grained sensitivity classification framework integrating both explicit user behavioral signals and community-generated content warnings. We construct a high-quality, publicly available dataset spanning two domains: movies (MovieLens augmented with *Does the Dog Die?* sensitivity labels) and fanfiction (AO3’s community-defined warning tags paired with user interaction logs). Contribution/Results: This work pioneers the integration of structured user feedback with decentralized, community-driven content safety practices, enabling quantifiable, longitudinal assessment of recommender systems’ impact on user well-being. The released dataset serves as critical infrastructure for sensitive-content governance, well-being–oriented optimization, and responsible AI research.

Technology Category

Application Category

📝 Abstract

Personalized AI systems, from recommendation systems to chatbots, are a prevalent method for distributing content to users based on their learned preferences. However, there is growing concern about the adverse effects of these systems, including their potential tendency to expose users to sensitive or harmful material, negatively impacting overall well-being. To address this concern quantitatively, it is necessary to create datasets with relevant sensitivity labels for content, enabling researchers to evaluate personalized systems beyond mere engagement metrics. To this end, we introduce two novel datasets that include a taxonomy of sensitivity labels alongside user-content ratings: one that integrates MovieLens rating data with content warnings from the Does the Dog Die? community ratings website, and another that combines fan-fiction interaction data and user-generated warnings from Archive of Our Own.

Problem

Research questions and friction points this paper is trying to address.

Addressing exposure to sensitive content in recommendation systems

Creating datasets with sensitivity labels for system evaluation

Integrating user ratings with content warnings from multiple sources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Datasets with sensitivity labels for content

Integrates MovieLens with Does the Dog Die

Combines fan-fiction data with Archive warnings

🔎 Similar Papers

Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives

2024-05-09arXiv.orgCitations: 4

Taxonomy and Analysis of Sensitive User Queries in Generative AI Search

2024-04-05arXiv.orgCitations: 0

Authors to Follow