Revisit Modality Imbalance at the Decision Layer

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This paper identifies a systematic modality imbalance problem at the decision level in multimodal learning: even when representation learning is well-balanced (e.g., via large-scale pretraining and optimization), models exhibit significant bias toward weak modalities—such as audio—during fusion. This bias arises intrinsically from geometric disparities in feature-space structure and decision-weight distributions, rather than merely from optimization dynamics; uncalibrated modality-wise output aggregation further exacerbates weight skew and suppresses weak-modality contributions. To address this, we propose a decision-level adaptive weighting mechanism. Evaluated on CREMAD and Kinetic-Sounds, our method demonstrably improves weak-modality participation and overall generalization. Experiments confirm that optimizing representations alone fails to mitigate this imbalance, whereas our approach achieves substantial gains. The work establishes a new paradigm for multimodal fusion architecture design, emphasizing decision-level calibration over representation-level balance.

Technology Category

Application Category

📝 Abstract

Multimodal learning integrates information from different modalities to enhance model performance, yet it often suffers from modality imbalance, where dominant modalities overshadow weaker ones during joint optimization. This paper reveals that such an imbalance not only occurs during representation learning but also manifests significantly at the decision layer. Experiments on audio-visual datasets (CREMAD and Kinetic-Sounds) show that even after extensive pretraining and balanced optimization, models still exhibit systematic bias toward certain modalities, such as audio. Further analysis demonstrates that this bias originates from intrinsic disparities in feature-space and decision-weight distributions rather than from optimization dynamics alone. We argue that aggregating uncalibrated modality outputs at the fusion stage leads to biased decision-layer weighting, hindering weaker modalities from contributing effectively. To address this, we propose that future multimodal systems should focus more on incorporate adaptive weight allocation mechanisms at the decision layer, enabling relative balanced according to the capabilities of each modality.

Problem

Research questions and friction points this paper is trying to address.

Modality imbalance persists at decision layer despite balanced optimization

Uncalibrated modality outputs cause biased decision-layer weighting

Intrinsic feature-space disparities hinder weaker modalities' effective contribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Address modality imbalance at decision layer

Use adaptive weight allocation mechanisms

Balance contributions based on modality capabilities

🔎 Similar Papers

No similar papers found.