🤖 AI Summary
This study systematically evaluates the suitability of YouTube Data API v3’s search functionality for academic research, focusing on completeness, representativeness, temporal consistency, and bias. Method: Over six months, we conducted an empirical audit comprising 11 weekly query batches, leveraging HTTP automation, cross-temporal result comparison, recall/precision analysis, and case-driven validation. Contribution/Results: We identify three structural deficiencies: (1) severe temporal decay—up to 80% of returned videos vanish within 20–60 days; (2) irreproducibility of search results across identical queries at different times; and (3) relevance-based ranking that introduces substantial off-topic content. These issues stem from YouTube’s “freshness-first” ranking policy, which critically undermines historical content retrieval and research reproducibility. While we propose mitigating strategies, we conclude that the current search interface fails to meet Digital Services Act (DSA) compliance requirements and rigorous scholarly data collection standards—providing critical empirical evidence for platform data governance and digital humanities methodology.
📝 Abstract
This paper critically audits the search endpoint of YouTube's Data API (v3), a common tool for academic research. Through systematic weekly searches over six months using eleven queries, we identify major limitations regarding completeness, representativeness, consistency, and bias. Our findings reveal substantial differences between ranking parameters like relevance and date in terms of video recall and precision, with relevance often retrieving numerous off-topic videos. We also find severe temporal decay, as the number of findable videos for a specific period dramatically decreases after just 20-60 days from the publication date, potentially hampering many different research designs. Furthermore, search results lack consistency, with identical queries yielding different video sets over time, compromising replicability. A case study on the European Parliament elections highlights how these issues impact research outcomes. While the paper offers several mitigation strategies, it concludes that the API's search function, potentially prioritizing"freshness"over comprehensive retrieval, is not adequate for robust academic research, especially concerning Digital Services Act requirements.