MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing post-training quantization methods for multimodal large language models often suffer from performance instability due to misaligned smoothing and the lack of cross-modal computational invariance. This work proposes a novel quantization framework that effectively aligns activation distributions across modalities while preserving their low-rank structural consistency. The approach integrates a modality-aware smoothing (MAS) mechanism with a cross-modal compensation strategy based on singular value decomposition (SVD) whitening. By jointly addressing activation misalignment and structural distortion, the method achieves the first unified and stable low-bit quantization for multimodal large models. It sets a new state-of-the-art in post-training quantization performance across both dual- and tri-modal settings, demonstrating robustness and scalability without requiring retraining or extensive calibration.

Technology Category

Application Category

📝 Abstract
Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.
Problem

Research questions and friction points this paper is trying to address.

Post-training quantization
Multimodal Large Language Models
Smoothing Misalignment
Cross-Modal Computational Invariance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-Aware Smoothing
Cross-Modal Compensation
Post-Training Quantization
Multimodal Large Language Models
SVD Whitening
🔎 Similar Papers
No similar papers found.
L
Lulu Hu
Alibaba Cloud Computing, Alibaba Group
W
Wenhu Xiao
Alibaba Cloud Computing, Alibaba Group
X
Xin Chen
Alibaba Cloud Computing, Alibaba Group
Xinhua Xu
Xinhua Xu
Peking University
Computer Vision
B
Bowen Xu
Alibaba Cloud Computing, Alibaba Group
K
Kun Li
Alibaba Cloud Computing, Alibaba Group
Y
Yongliang Tao
Alibaba Cloud Computing, Alibaba Group