Revisit Modality Imbalance at the Decision Layer

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies a systematic modality imbalance problem at the decision level in multimodal learning: even when representation learning is well-balanced (e.g., via large-scale pretraining and optimization), models exhibit significant bias toward weak modalities—such as audio—during fusion. This bias arises intrinsically from geometric disparities in feature-space structure and decision-weight distributions, rather than merely from optimization dynamics; uncalibrated modality-wise output aggregation further exacerbates weight skew and suppresses weak-modality contributions. To address this, we propose a decision-level adaptive weighting mechanism. Evaluated on CREMAD and Kinetic-Sounds, our method demonstrably improves weak-modality participation and overall generalization. Experiments confirm that optimizing representations alone fails to mitigate this imbalance, whereas our approach achieves substantial gains. The work establishes a new paradigm for multimodal fusion architecture design, emphasizing decision-level calibration over representation-level balance.

Technology Category

Application Category

📝 Abstract
Multimodal learning integrates information from different modalities to enhance model performance, yet it often suffers from modality imbalance, where dominant modalities overshadow weaker ones during joint optimization. This paper reveals that such an imbalance not only occurs during representation learning but also manifests significantly at the decision layer. Experiments on audio-visual datasets (CREMAD and Kinetic-Sounds) show that even after extensive pretraining and balanced optimization, models still exhibit systematic bias toward certain modalities, such as audio. Further analysis demonstrates that this bias originates from intrinsic disparities in feature-space and decision-weight distributions rather than from optimization dynamics alone. We argue that aggregating uncalibrated modality outputs at the fusion stage leads to biased decision-layer weighting, hindering weaker modalities from contributing effectively. To address this, we propose that future multimodal systems should focus more on incorporate adaptive weight allocation mechanisms at the decision layer, enabling relative balanced according to the capabilities of each modality.
Problem

Research questions and friction points this paper is trying to address.

Modality imbalance persists at decision layer despite balanced optimization
Uncalibrated modality outputs cause biased decision-layer weighting
Intrinsic feature-space disparities hinder weaker modalities' effective contribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Address modality imbalance at decision layer
Use adaptive weight allocation mechanisms
Balance contributions based on modality capabilities
🔎 Similar Papers
No similar papers found.
Xiaoyu Ma
Xiaoyu Ma
Carnegie Mellon University
Transportation network modelingmachine learningreinforcement learningsimulationoptimization
H
Hao Chen
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China