🤖 AI Summary
This work addresses the issue of outlier information loss in state space model (SSM) activation quantization caused by hard clipping. To mitigate this, the authors propose Quamba-SE, a soft-edge quantizer that introduces, for the first time in SSM activation quantization, a three-segment adaptive scaling mechanism. This approach applies non-uniform quantization with high, standard, and low precision to small values, regular values, and outliers, respectively, thereby preserving outlier information without resorting to hard truncation while maintaining overall accuracy. Experimental results on the Mamba-130M model demonstrate that Quamba-SE achieves an average accuracy improvement of +0.83% across six zero-shot benchmarks, with a maximum gain of +2.68% on a single task.
📝 Abstract
We propose Quamba-SE, a soft-edge quantizer for State Space Model (SSM) activation quantization. Unlike existing methods, using standard INT8 operation, Quamba-SE employs three adaptive scales: high-precision for small values, standard scale for normal values, and low-precision for outliers. This preserves outlier information instead of hard clipping, while maintaining precision for other values. We evaluate on Mamba- 130M across 6 zero-shot benchmarks. Results show that Quamba- SE consistently outperforms Quamba, achieving up to +2.68% on individual benchmarks and up to +0.83% improvement in the average accuracy of 6 datasets.