Learnable Game-theoretic Policy Optimization for Data-centric Self-explanation Rationalization

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Traditional self-explanatory models suffer from mode collapse: generators produce fixed, low-quality explanation fragments, degrading interpretability. This work is the first to model the generator–predictor interaction from a game-theoretic perspective, revealing that mode collapse arises from convergence to non-equilibrium strategies. To address this, we propose a cooperative game framework featuring a dynamic, learnable strategy intervention mechanism. By formulating the joint learning process as a cooperative game and optimizing via policy gradient methods, our approach enables end-to-end co-training of generator and predictor. Theoretical analysis and iterative optimization ensure synergistic improvement in both explanation fidelity and predictive performance. Evaluated on nine real-world datasets and two synthetic benchmarks, our method achieves up to an 8.1% improvement in explanation quality—significantly outperforming existing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Rationalization, a data-centric framework, aims to build self-explanatory models to explain the prediction outcome by generating a subset of human-intelligible pieces of the input data. It involves a cooperative game model where a generator generates the most human-intelligible parts of the input (i.e., rationales), followed by a predictor that makes predictions based on these generated rationales. Conventional rationalization methods typically impose constraints via regularization terms to calibrate or penalize undesired generation. However, these methods are suffering from a problem called mode collapse, in which the predictor produces correct predictions yet the generator consistently outputs rationales with collapsed patterns. Moreover, existing studies are typically designed separately for specific collapsed patterns, lacking a unified consideration. In this paper, we systematically revisit cooperative rationalization from a novel game-theoretic perspective and identify the fundamental cause of this problem: the generator no longer tends to explore new strategies to uncover informative rationales, ultimately leading the system to converge to a suboptimal game equilibrium (correct predictions v.s collapsed rationales). To solve this problem, we then propose a novel approach, Game-theoretic Policy Optimization oriented RATionalization (PORAT), which progressively introduces policy interventions to address the game equilibrium in the cooperative game process, thereby guiding the model toward a more optimal solution state. We theoretically analyse the cause of such a suboptimal equilibrium and prove the feasibility of the proposed method. Furthermore, we validate our method on nine widely used real-world datasets and two synthetic settings, where PORAT achieves up to 8.1% performance improvements over existing state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Addressing mode collapse in cooperative rationalization game models

Overcoming generator's tendency to produce collapsed rationale patterns

Solving suboptimal equilibrium between predictor and generator outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Game-theoretic policy optimization for rationalization

Progressive interventions to address equilibrium collapse

Unified framework preventing generator mode collapse

🔎 Similar Papers

Explaining Decisions of Agents in Mixed-Motive Games