The Curious Case of In-Training Compression of State Space Models

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

To address the trade-off between expressiveness and computational cost in State Space Models (SSMs) for long-sequence modeling, this paper proposes a novel dynamic state-space compression method during training. The core innovation is the first integration of Hankel singular value analysis—rooted in control theory—into SSM training. Leveraging the linear time-invariant system formulation and selective SSM architecture, our approach identifies and retains high-impact state dimensions in real time during optimization, enabling task-aware adaptive truncation. Unlike fixed low-dimensional designs, this method preserves expressive capacity while significantly accelerating convergence. Empirically, compressed models outperform baselines on long-sequence tasks—including language modeling and time-series forecasting—without sacrificing inference efficiency. Experiments demonstrate that our approach achieves synergistic optimization of computational efficiency and modeling capability, improving both accuracy and speed.

Technology Category

Application Category

📝 Abstract

State Space Models (SSMs), developed to tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference. At their core are recurrent dynamical systems that maintain a hidden state, with update costs scaling with the state dimension. A key design challenge is striking the right balance between maximizing expressivity and limiting this computational burden. Control theory, and more specifically Hankel singular value analysis, provides a potent framework for the measure of energy for each state, as well as the balanced truncation of the original system down to a smaller representation with performance guarantees. Leveraging the eigenvalue stability properties of Hankel matrices, we apply this lens to SSMs during training, where only dimensions of high influence are identified and preserved. Our approach applies to Linear Time-Invariant SSMs such as Linear Recurrent Units, but is also extendable to selective models. Experiments show that in-training reduction significantly accelerates optimization while preserving expressivity, with compressed models retaining task-critical structure lost by models trained directly at smaller dimension. In other words, SSMs that begin large and shrink during training achieve computational efficiency while maintaining higher performance.

Problem

Research questions and friction points this paper is trying to address.

Balancing expressivity and computational efficiency in State Space Models

Applying Hankel singular value analysis to compress SSMs during training

Accelerating optimization while preserving task-critical model structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-training compression using Hankel singular value analysis

Balanced truncation preserves high-influence state dimensions

Compression during training accelerates optimization while maintaining expressivity

🔎 Similar Papers

No similar papers found.