🤖 AI Summary
This study addresses critical challenges in implementing the GDPR’s right of access—specifically, the low comprehensibility, poor reliability, and incompleteness of Data Download Packages (DDPs) provided by social media platforms. It conducts a systematic, cross-platform assessment of DDP compliance and usability across Instagram, TikTok, and YouTube. Leveraging a mixed-methods approach—including a 400-participant multinational user survey, automated web crawling, manual annotation, and user-donated data—the work delivers the first empirical, comparative analysis of DDPs across major platforms. Findings reveal substantial cross-platform disparities in data category coverage, consistency of GDPR clause implementation, and technical DDP reliability. Innovatively, the study proposes an LLM-based DDP reconstruction framework, empirically demonstrating significant improvements in concision, clarity, and readability. This approach advances user data sovereignty and offers a scalable technical pathway and policy-relevant insights for digital social science research.
📝 Abstract
The comprehensibility and reliability of data download packages (DDPs) provided under the General Data Protection Regulation's (GDPR) right of access are vital for both individuals and researchers. These DDPs enable users to understand and control their personal data, yet issues like complexity and incomplete information often limit their utility. Also, despite their growing use in research to study emerging online phenomena, little attention has been given to systematically assessing the reliability and comprehensibility of DDPs. To bridge this research gap, in this work, we perform a comparative analysis to assess the comprehensibility and reliability of DDPs provided by three major social media platforms, namely, TikTok, Instagram, and YouTube. By recruiting 400 participants across four countries, we assess the comprehensibility of DDPs across various requirements, including conciseness, transparency, intelligibility, and clear and plain language. Also, by leveraging automated bots and user-donated DDPs, we evaluate the reliability of DDPs across the three platforms. Among other things, we find notable differences across the three platforms in the data categories included in DDPs, inconsistencies in adherence to the GDPR requirements, and gaps in the reliability of the DDPs across platforms. Finally, using large language models, we demonstrate the feasibility of easily providing more comprehensible DDPs.