EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study evaluates the capacity of large language models (LLMs) to achieve political consensus under multi-party systems and heterogeneous power structures. To this end, we introduce EuroCon—the first benchmark dataset for parliamentary deliberation—comprising 2,225 high-quality debate records from the European Parliament (2009–2022). We propose a novel four-dimensional modeling framework for parliamentary negotiation—capturing议题 (topic), objective, party affiliation, and seat-weighted influence—and design a consensus generation and evaluation protocol grounded in real-world voting rules (e.g., qualified majority thresholds, topic sensitivity). Experimental results reveal that state-of-the-art LLMs exhibit significant limitations on complex consensus tasks and display systematic bias toward dominant parties’ positions. This work establishes a new paradigm for trustworthy, empirically grounded evaluation of political AI, offering both methodological innovation and foundational evidence for assessing LLMs in institutional democratic contexts.

Technology Category

Application Category

📝 Abstract
Achieving political consensus is crucial yet challenging for the effective functioning of social governance. However, although frontier AI systems represented by large language models (LLMs) have developed rapidly in recent years, their capabilities on this scope are still understudied. In this paper, we introduce EuroCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to reach political consensus among divergent party positions across diverse parliament settings. Specifically, EuroCon incorporates four factors to build each simulated parliament setting: specific political issues, political goals, participating parties, and power structures based on seat distribution. We also develop an evaluation framework for EuroCon to simulate real voting outcomes in different parliament settings, assessing whether LLM-generated resolutions meet predefined political goals. Our experimental results demonstrate that even state-of-the-art models remain undersatisfied with complex tasks like passing resolutions by a two-thirds majority and addressing security issues, while revealing some common strategies LLMs use to find consensus under different power structures, such as prioritizing the stance of the dominant party, highlighting EuroCon's promise as an effective platform for studying LLMs' ability to find political consensus.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to reach political consensus in diverse parliament settings
Assessing LLM-generated resolutions against predefined political goals
Identifying challenges in passing resolutions with complex requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

EuroCon benchmark from European Parliament records
Simulates parliament settings with four factors
Evaluates LLMs' political consensus finding strategies
🔎 Similar Papers
No similar papers found.
Zhaowei Zhang
Zhaowei Zhang
Peking University
AI GovernanceAI AlignmentGame TheoryHuman-AI Collaboration
M
Minghua Yi
Wuhan University
M
Mengmeng Wang
State Key Laboratory of General Artificial Intelligence, BIGAI
Fengshuo Bai
Fengshuo Bai
Shanghai Jiao Tong University
Embodied AIAI AlignmentReinforcement LearningPreference-based Learning
Z
Zilong Zheng
State Key Laboratory of General Artificial Intelligence, BIGAI
Yipeng Kang
Yipeng Kang
BIGAI
Natural language processing
Y
Yaodong Yang
Institute for Artificial Intelligence, Peking University