Modality Dominance-Aware Optimization for Embodied RGB-Infrared Perception

๐Ÿ“… 2026-01-02
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the optimization bias in RGB-infrared multimodal perception, where disparities in inter-modal information density and feature quality often lead models to over-rely on the dominant modality, thereby hindering effective fusion. To mitigate this issue, the study introduces the Modality Dominance Index (MDI)โ€”the first metric to quantitatively assess modality dominanceโ€”and proposes the MDACL framework. MDACL dynamically balances cross-modal optimization through Hierarchical Cross-modal Guidance (HCG) and Adversarial Equilibrium Regularization (AER). Evaluated on three RGB-infrared benchmark datasets, the method achieves state-of-the-art performance, significantly alleviating optimization bias and enhancing the robustness and effectiveness of multimodal fusion.

Technology Category

Application Category

๐Ÿ“ Abstract
RGB-Infrared (RGB-IR) multimodal perception is fundamental to embodied multimedia systems operating in complex physical environments. Although recent cross-modal fusion methods have advanced RGB-IR detection, the optimization dynamics caused by asymmetric modality characteristics remain underexplored. In practice, disparities in information density and feature quality introduce persistent optimization bias, leading training to overemphasize a dominant modality and hindering effective fusion. To quantify this phenomenon, we propose the Modality Dominance Index (MDI), which measures modality dominance by jointly modeling feature entropy and gradient contribution. Based on MDI, we develop a Modality Dominance-Aware Cross-modal Learning (MDACL) framework that regulates cross-modal optimization. MDACL incorporates Hierarchical Cross-modal Guidance (HCG) to enhance feature alignment and Adversarial Equilibrium Regularization (AER) to balance optimization dynamics during fusion. Extensive experiments on three RGB-IR benchmarks demonstrate that MDACL effectively mitigates optimization bias and achieves SOTA performance.
Problem

Research questions and friction points this paper is trying to address.

Modality Dominance
RGB-Infrared Perception
Optimization Bias
Cross-modal Fusion
Embodied Multimedia Systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality Dominance Index
Cross-modal Optimization
RGB-Infrared Perception
Adversarial Equilibrium Regularization
Hierarchical Cross-modal Guidance
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xianhui Liu
College of Electronics and Information Engineering, Tongji University
S
Siqi Jiang
School of Computer Science and Technology, Tongji University
Yi Xie
Yi Xie
University of Arizona
Multi-agent System
Yuqing Lin
Yuqing Lin
The University of Newcastle
Discrete MathSoftware EngineeringMachine Learning
S
Siao Liu
School of Future Science and Engineering, Soochow University; Key Laboratory of General Artificial Intelligence and Large Models in Provincial Universities, Soochow University