🤖 AI Summary
To address the significant degradation in inference accuracy of State Space Models (SSMs) on analog Compute-in-Memory (CIM) hardware—caused by device non-idealities inducing weight perturbations in the state-space representation—this paper proposes a digital-analog hybrid projection decomposition method. The key insight is that the output projection layer exhibits high sensitivity to weight perturbations; thus, the right singular vector matrix (V) obtained via Singular Value Decomposition (SVD) is offloaded to the digital domain for precise correction. This strategy preserves computational robustness while maintaining full compatibility with analog CIM architectures. Evaluated on the Mamba model under various hardware noise conditions, the method achieves substantial performance gains: perplexity improves by up to 99.57%, and PIQA commonsense reasoning accuracy increases by up to 96.67%. The approach provides an effective solution for high-accuracy SSM deployment on analog CIM platforms.
📝 Abstract
State Space Models (SSMs) are efficient alternatives to traditional sequence models, excelling at processing long sequences with lower computational complexity. Their reliance on matrix multiplications makes them ideal for compute-in-memory (CIM) architectures, which improve energy efficiency by computing within memory arrays. However, device non-idealities in CIM introduce weight perturbations that can degrade inference accuracy. In this paper, we systematically analyze the robustness of SSMs under noisy conditions, identifying that the final block and output projection layers are more susceptible to perturbations compared to other components. Building on these insights, we propose HPD, a Hybrid Projection Decomposition strategy for the last output projection layer. We replace the original weight matrix with the multiplication of U and Σ in its SVD to ensure compatibility with existing hardware architectures, while offloading V> to digital hardware for precise and robust correction. Comprehensive tests on Mamba models show that our method reduces perplexity by up to 99.57% under various noise conditions compared to baseline models, with accuracy gains of up to 96.67% on the PIQA benchmark for commonsense reasoning.