Forgetful by Design? A Critical Audit of YouTube's Search API for Academic Research

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the suitability of YouTube Data API v3’s search functionality for academic research, focusing on completeness, representativeness, temporal consistency, and bias. Method: Over six months, we conducted an empirical audit comprising 11 weekly query batches, leveraging HTTP automation, cross-temporal result comparison, recall/precision analysis, and case-driven validation. Contribution/Results: We identify three structural deficiencies: (1) severe temporal decay—up to 80% of returned videos vanish within 20–60 days; (2) irreproducibility of search results across identical queries at different times; and (3) relevance-based ranking that introduces substantial off-topic content. These issues stem from YouTube’s “freshness-first” ranking policy, which critically undermines historical content retrieval and research reproducibility. While we propose mitigating strategies, we conclude that the current search interface fails to meet Digital Services Act (DSA) compliance requirements and rigorous scholarly data collection standards—providing critical empirical evidence for platform data governance and digital humanities methodology.

Technology Category

Application Category

📝 Abstract
This paper critically audits the search endpoint of YouTube's Data API (v3), a common tool for academic research. Through systematic weekly searches over six months using eleven queries, we identify major limitations regarding completeness, representativeness, consistency, and bias. Our findings reveal substantial differences between ranking parameters like relevance and date in terms of video recall and precision, with relevance often retrieving numerous off-topic videos. We also find severe temporal decay, as the number of findable videos for a specific period dramatically decreases after just 20-60 days from the publication date, potentially hampering many different research designs. Furthermore, search results lack consistency, with identical queries yielding different video sets over time, compromising replicability. A case study on the European Parliament elections highlights how these issues impact research outcomes. While the paper offers several mitigation strategies, it concludes that the API's search function, potentially prioritizing"freshness"over comprehensive retrieval, is not adequate for robust academic research, especially concerning Digital Services Act requirements.
Problem

Research questions and friction points this paper is trying to address.

Audits YouTube's API search limitations for academic research
Reveals temporal decay and inconsistency in search results
Highlights bias and replicability issues in video retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic weekly searches over six months
Identifies ranking parameters differences in recall
Highlights severe temporal decay in video findability
🔎 Similar Papers
No similar papers found.
Bernhard Rieder
Bernhard Rieder
University of Amsterdam
A
Adrián Padilla
Department of Communication and Advertising, Euncet Business School, Terrassa, Spain
Ò
Òscar Coromina
Department of Audiovisual Communication and Advertising, Universitat Autònoma de Barcelona, Barcelona, Spain