The Paradox of Outcome Optimization: A Causal Information-Theoretic Bound on Reasoning Shortcuts in LLMs

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Large language models often rely on spurious correlations to form reasoning shortcuts in out-of-distribution tasks, undermining their robustness. This work integrates structural causal models with the information bottleneck principle to formalize reasoning as a high-complexity causal process, establishing the first causal information-theoretic framework. Within this framework, the authors uncover a reward-induced manifold collapse mechanism and expose the limitations of homogeneous data expansion. By introducing a semantic coverage metric η, they derive a generalization upper bound dependent on semantic coverage rather than sample size. Furthermore, they demonstrate that process reward models act as topological filters, progressively eliminating low-complexity shortcut manifolds through mutual information constraints, thereby providing a theoretical foundation for enhancing robust reasoning in language models.

📝 Abstract

Large Language Models (LLMs) aligned via outcome-based Reinforcement Learning (RL) frequently exhibit a critical failure mode: they achieve high performance on in-distribution benchmarks while demonstrating brittle reasoning capabilities on out-of-distribution (OOD) tasks. We term this phenomenon Reward-Induced Manifold Collapse. We establish a theoretical framework bridging Structural Causal Models (SCM) and the Information Bottleneck (IB) principle to explain this paradox. We define reasoning as a high-complexity causal process and shortcut learning as the exploitation of low-complexity spurious correlations. Under the implicit inductive bias of Stochastic Gradient Descent (SGD), models optimized for outcome rewards are biased toward shortcut solutions whenever the training distribution allows for a ``Markovian Screening'' of the true causal mechanism. We derive a new generalization bound based on Semantic Coverage Measure ($η$) rather than sample size, showing why data scaling on homogeneous distributions may fail to correct reasoning flaws. We also show that Process Reward Models (PRMs) function as Topological Filters, enforcing step-wise mutual information constraints that render the low-complexity shortcut manifold inadmissible. These results provide a mathematical grounding for the role of process supervision beyond simple credit assignment.

Problem

Research questions and friction points this paper is trying to address.

reasoning shortcuts

outcome optimization

out-of-distribution generalization

causal reasoning

reward-induced manifold collapse

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward-Induced Manifold Collapse

Structural Causal Models

Information Bottleneck