Late Breaking Results: Quamba-SE: Soft-edge Quantizer for Activations in State Space Models

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the issue of outlier information loss in state space model (SSM) activation quantization caused by hard clipping. To mitigate this, the authors propose Quamba-SE, a soft-edge quantizer that introduces, for the first time in SSM activation quantization, a three-segment adaptive scaling mechanism. This approach applies non-uniform quantization with high, standard, and low precision to small values, regular values, and outliers, respectively, thereby preserving outlier information without resorting to hard truncation while maintaining overall accuracy. Experimental results on the Mamba-130M model demonstrate that Quamba-SE achieves an average accuracy improvement of +0.83% across six zero-shot benchmarks, with a maximum gain of +2.68% on a single task.

Technology Category

Application Category

📝 Abstract
We propose Quamba-SE, a soft-edge quantizer for State Space Model (SSM) activation quantization. Unlike existing methods, using standard INT8 operation, Quamba-SE employs three adaptive scales: high-precision for small values, standard scale for normal values, and low-precision for outliers. This preserves outlier information instead of hard clipping, while maintaining precision for other values. We evaluate on Mamba- 130M across 6 zero-shot benchmarks. Results show that Quamba- SE consistently outperforms Quamba, achieving up to +2.68% on individual benchmarks and up to +0.83% improvement in the average accuracy of 6 datasets.
Problem

Research questions and friction points this paper is trying to address.

activation quantization
State Space Models
outliers
quantization
precision preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

soft-edge quantization
State Space Models
activation quantization
adaptive scaling
outlier preservation
Y
Yizhi Chen
Department of Electronics and Embedded Systems, KTH Royal Institute of Technology, Stockholm, Sweden
Ahmed Hemani
Ahmed Hemani
KTH Royal Institute of Technology, Stockholm
VLSI designNeural NetworksMassively Parallel ArchtiecturesDesign Automation