🤖 AI Summary
This study addresses the challenge of reliably evaluating subjective continuous attributes—such as political stance—where traditional pairwise verification proves inadequate. The authors propose a dual-scale validation framework that integrates pointwise and pairwise human annotations to assess political stance predictions across 23,228 multilingual debate arguments using 22 language models. This approach establishes the first scalable, high-consistency validation system for subjective continuous knowledge, yielding a human-verified, structured knowledge base of political arguments. The results demonstrate that pointwise prediction effectively recovers the ordinal structure of political stances, achieving moderate human-model agreement (Krippendorff’s α = 0.578), while the best-performing model in pairwise verification attains strong consistency (α = 0.86).
📝 Abstract
Real-world knowledge representation often requires capturing subjective, continuous attributes -- such as political positions -- that conflict with pairwise validation, the widely accepted gold standard for human evaluation. We address this challenge through a dual-scale validation framework applied to political stance prediction in argumentative discourse, combining pointwise and pairwise human annotation. Using 22 language models, we construct a large-scale knowledge base of political position predictions for 23,228 arguments drawn from 30 debates that appeared on the UK politicial television programme \textit{Question Time}. Pointwise evaluation shows moderate human-model agreement (Krippendorff's $\alpha=0.578$), reflecting intrinsic subjectivity, while pairwise validation reveals substantially stronger alignment between human- and model-derived rankings ($\alpha=0.86$ for the best model). This work contributes: (i) a practical validation methodology for subjective continuous knowledge that balances scalability with reliability; (ii) a validated structured argumentation knowledge base enabling graph-based reasoning and retrieval-augmented generation in political domains; and (iii) evidence that ordinal structure can be extracted from pointwise language models predictions from inherently subjective real-world discourse, advancing knowledge representation capabilities for domains where traditional symbolic or categorical approaches are insufficient.