SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the significant performance degradation of large language models (LLMs) on stance detection tasks involving semantically and pragmatically complex inputs, a challenge that existing enhancement methods struggle to overcome. The authors propose the Stance Inference Complexity Index (SICI), the first metric to quantify the semantic-pragmatic burden of text pairs across seven dimensions. Leveraging the SemEval-2016 and VAST datasets, they conduct cross-model error analyses and evaluate fifteen intervention strategies. Their findings reveal a robust three-phase error transition pattern in LLMs correlated with increasing SICI, consistent across models. SICI demonstrates significantly higher predictive accuracy for model errors than surface-level features (inter-rater reliability α = 0.771) and exposes a critical limitation: techniques such as prompt engineering primarily modulate model behavior along an attribution-abstention axis without effectively mitigating high-complexity challenges.

📝 Abstract

Prompt-based LLMs are increasingly used for stance detection, but harder examples are not always repaired by clearer instructions, reasoning prompts, retrieval, or debate. We introduce SICI (Stance Inference Complexity Index), a seven-dimensional diagnostic measure of the semantic-pragmatic burden imposed by a target--text pair. Across SemEval-2016 and VAST, SICI predicts LLM accuracy better than surface proxies and shows substantial cross-scorer reliability ($α=0.771$). More importantly, LLM errors change regime as SICI increases: low-complexity examples invite over-attribution, especially Against predictions; intermediate examples form an unstable boundary; and high-complexity examples rapidly concentrate on None. This phase-transition-like structure persists across GPT-3.5, GPT-4o-mini, DeepSeek-V3, and GPT-4o, although stronger models move the boundaries. A 15-method intervention study further shows that prompting, retrieval, and debate often shift models along the attribution--abstention axis rather than removing the high-complexity bottleneck.

Problem

Research questions and friction points this paper is trying to address.

stance detection

semantic-pragmatic complexity

large language models

regime shifts

complexity bottleneck

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-Pragmatic Complexity

Stance Detection

Large Language Models