Rethinking Selectivity in State Space Models: A Minimal Predictive Sufficiency Approach

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing selective mechanisms in State Space Models (SSMs) lack theoretical grounding and are susceptible to spurious correlations, compromising model optimality and robustness. To address this, we propose the **Predictive Sufficiency Principle**—the first information-theoretic framework that rigorously preserves future predictive capability while compressing historical information into a minimal sufficient statistic. This principle provides a first-principles foundation for selective mechanisms and serves as a general-purpose regularization method applicable across diverse architectures. Our approach jointly optimizes state representations and sufficient statistics via an information-theoretic objective, effectively suppressing non-causal noise and spurious patterns. Empirically, it achieves state-of-the-art performance on multiple benchmarks, with particularly notable gains in long-horizon forecasting and high-noise regimes—demonstrating superior generalization and robustness.

Technology Category

Application Category

📝 Abstract
State Space Models (SSMs), particularly recent selective variants like Mamba, have emerged as a leading architecture for sequence modeling, challenging the dominance of Transformers. However, the success of these state-of-the-art models largely relies on heuristically designed selective mechanisms, which lack a rigorous first-principle derivation. This theoretical gap raises questions about their optimality and robustness against spurious correlations. To address this, we introduce the Principle of Predictive Sufficiency, a novel information-theoretic criterion stipulating that an ideal hidden state should be a minimal sufficient statistic of the past for predicting the future. Based on this principle, we propose the Minimal Predictive Sufficiency State Space Model (MPS-SSM), a new framework where the selective mechanism is guided by optimizing an objective function derived from our principle. This approach encourages the model to maximally compress historical information without losing predictive power, thereby learning to ignore non-causal noise and spurious patterns. Extensive experiments on a wide range of benchmark datasets demonstrate that MPS-SSM not only achieves state-of-the-art performance, significantly outperforming existing models in long-term forecasting and noisy scenarios, but also exhibits superior robustness. Furthermore, we show that the MPS principle can be extended as a general regularization framework to enhance other popular architectures, highlighting its broad potential.
Problem

Research questions and friction points this paper is trying to address.

Lack of rigorous principles for selective mechanisms in SSMs
Optimality and robustness concerns against spurious correlations
Need for minimal sufficient hidden states in sequence modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Principle of Predictive Sufficiency
Proposes Minimal Predictive Sufficiency SSM
Optimizes objective for predictive compression
🔎 Similar Papers
No similar papers found.
Y
Yiyi Wang
Xi’an Jiaotong University
J
Jian'an Zhang
Peking University
H
Hongyi Duan
The Hong Kong University of Science and Technology (Guangzhou)
H
Haoyang Liu
Xiamen University
Qingyang Li
Qingyang Li
LLM and RLHF
Machine LearningReinforcement LearningDeep LearningAutonomous Driving