Beyond the Link: Assessing LLMs' ability to Classify Political Content across Global Media

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether URLs alone suffice to accurately identify political content (PC) in multilingual news from France, Germany, Spain, the UK, and the US—bypassing costly full-text analysis. We propose the first cross-lingual, cross-national URL-level PC classification framework, leveraging multilingual large language models (LLMs) via zero-shot and few-shot prompting, validated against a human-annotated gold standard and benchmarked against supervised baselines (SVM, fine-tuned BERT). Results demonstrate that URLs encode sufficient political semantic signals to achieve 85–92% F1-score consistency with full-text analysis across all five countries and languages, while drastically reducing computational overhead and data acquisition barriers. Our core contribution is the first systematic empirical validation of URL-level analysis for political communication research—establishing its efficacy, methodological boundaries, and viability as a lightweight, scalable paradigm for cross-national media monitoring.

Technology Category

Application Category

📝 Abstract
The use of large language models (LLMs) is becoming common in the context of political science, particularly in studies that analyse individuals use of digital media. However, while previous research has demonstrated LLMs ability at labelling tasks, the effectiveness of using LLMs to classify political content (PC) from just URLs is not yet well explored. The work presented in this article bridges this gap by evaluating whether LLMs can accurately identify PC vs. non-PC from both the article text and the URLs from five countries (France, Germany, Spain, the UK, and the US) and different languages. Using cutting-edge LLMs like GPT, Llama, Mistral, Deepseek, Qwen and Gemma, we measure model performance to assess whether URL-level analysis can be a good approximation for full-text analysis of PC, even across different linguistic and national contexts. Model outputs are compared with human-labelled articles, as well as traditional supervised machine learning techniques, to set a baseline of performance. Overall, our findings suggest the capacity of URLs to embed most of the news content, providing a vital perspective on accuracy-cost balancing. We also account for contextual limitations and suggest methodological recommendations to use LLMs within political science studies.
Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs' accuracy in classifying political content from URLs and text
Compare URL-level and full-text analysis across five countries and languages
Assess performance of advanced LLMs versus human labels and traditional ML
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLMs for political content classification
Comparing URL and full-text analysis across countries
Benchmarking LLMs against human and traditional methods
🔎 Similar Papers
No similar papers found.
Alberto Martinez-Serra
Alberto Martinez-Serra
Barcelona Supercomputing Center
Bio-Nano InteractionsNanosafetyBiophysicsComplex Systems
A
Alejandro De La Fuente
Barcelona Supercomputing Center (BSC), Plaça Eusebi Güell 1-3, 08034 Barcelona, Spain. Department of Political Science, Universitat de Barcelona (UB), Avinguda Diagonal 684, 08034 Barcelona, Spain.
N
Nienke Viescher
Barcelona Supercomputing Center (BSC), Plaça Eusebi Güell 1-3, 08034 Barcelona, Spain.
Ana S. Cardenal
Ana S. Cardenal
Universitat Oberta de Catalunya, Barcelona Supercomputing Center
Digital MediaNews AudiencesPublic OpinionPolitical BehaviorComputational Methods