π€ AI Summary
This study presents the first systematic empirical investigation into the dissemination mechanisms, societal harms, and governance of conspiracy-theory content on TikTok. Leveraging a three-year longitudinal dataset of 1.5 million U.S. usersβ videos, we construct the first large-scale TikTok conspiracy-theory benchmark dataset to quantify temporal prevalence trends and disentangle the relationship between platform incentive structures (e.g., the Creative Program) and content risk. Methodologically, we evaluate open-source LLMs (Llama, Phi) for ASR-based detection, compare them against fine-tuned RoBERTa, and deploy an ASRβtext classification pipeline to estimate incident volume. Results show that creator incentives significantly increase video duration but do not amplify conspiratorial framing; Llama/Phi achieve 96% accuracy at audio transcription but underperform RoBERTa in end-to-end F1; the pipeline estimates β€1,000 newly emergent conspiracy videos per month. Our work establishes a methodological framework and empirical foundation for mitigating harmful information on short-video platforms.
π Abstract
TikTok has skyrocketed in popularity over recent years, especially among younger audiences. However, there are public concerns about the potential of this platform to promote and amplify harmful content. This study presents the first systematic analysis of conspiracy theories on TikTok. By leveraging the official TikTok Research API we collect a longitudinal dataset of 1.5M videos shared in the U.S. over three years. We estimate a lower bound on the prevalence of conspiratorial videos (up to 1000 new videos per month) and evaluate the effects of TikTok's Creativity Program for monetization, observing an overall increase in video duration regardless of content. Lastly, we evaluate the capabilities of state-of-the-art open-weight Large Language Models to identify conspiracy theories from audio transcriptions of videos. While these models achieve high precision in detecting harmful content (up to 96%), their overall performance remains comparable to fine-tuned traditional models such as RoBERTa. Our findings suggest that Large Language Models can serve as an effective tool for supporting content moderation strategies aimed at reducing the spread of harmful content on TikTok.