🤖 AI Summary
This study investigates whether URLs alone suffice to accurately identify political content (PC) in multilingual news from France, Germany, Spain, the UK, and the US—bypassing costly full-text analysis. We propose the first cross-lingual, cross-national URL-level PC classification framework, leveraging multilingual large language models (LLMs) via zero-shot and few-shot prompting, validated against a human-annotated gold standard and benchmarked against supervised baselines (SVM, fine-tuned BERT). Results demonstrate that URLs encode sufficient political semantic signals to achieve 85–92% F1-score consistency with full-text analysis across all five countries and languages, while drastically reducing computational overhead and data acquisition barriers. Our core contribution is the first systematic empirical validation of URL-level analysis for political communication research—establishing its efficacy, methodological boundaries, and viability as a lightweight, scalable paradigm for cross-national media monitoring.
📝 Abstract
The use of large language models (LLMs) is becoming common in the context of political science, particularly in studies that analyse individuals use of digital media. However, while previous research has demonstrated LLMs ability at labelling tasks, the effectiveness of using LLMs to classify political content (PC) from just URLs is not yet well explored. The work presented in this article bridges this gap by evaluating whether LLMs can accurately identify PC vs. non-PC from both the article text and the URLs from five countries (France, Germany, Spain, the UK, and the US) and different languages. Using cutting-edge LLMs like GPT, Llama, Mistral, Deepseek, Qwen and Gemma, we measure model performance to assess whether URL-level analysis can be a good approximation for full-text analysis of PC, even across different linguistic and national contexts. Model outputs are compared with human-labelled articles, as well as traditional supervised machine learning techniques, to set a baseline of performance. Overall, our findings suggest the capacity of URLs to embed most of the news content, providing a vital perspective on accuracy-cost balancing. We also account for contextual limitations and suggest methodological recommendations to use LLMs within political science studies.