CL-DMDF:Dynamic Multimodal Data Fusion Model Based on Contrastive Learning

๐Ÿ“… 2026-06-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

202K/year
๐Ÿค– AI Summary
This work addresses the challenge of modality missing in real-world multimodal data and the neglect of global complementary information in existing approaches by proposing a contrastive learningโ€“based dynamic multimodal fusion model. The model captures global dependencies through an attention mechanism operating across both feature and modality dimensions, enhances representation robustness via entity-to-centroid contrastive learning, and employs an adaptive fusion strategy to handle incomplete inputs. Extensive experiments on three standard benchmarks demonstrate that the proposed method significantly outperforms state-of-the-art approaches across diverse multimodal tasks, confirming its effectiveness and strong generalization capability.
๐Ÿ“ Abstract
Multimodal data fusion involves integrating and analyzing information from multiple modalities to uncover latent correlations and complementary patterns, thereby enhancing data processing and decision-making. While existing methods for structured multimodal inputs are typically designed around specific tasks and assume fully observed modalities, real-world applications often suffer from uncertain or missing modality inputs due to various factors. Some traditional models overly emphasize local interactions within missing modalities, neglecting the global complementary cues embedded in multimodal representations. To overcome these limitations, we propose a Dynamic Multimodal Data Fusion model based on Contrastive Learning (CL-DMDF). CL-DMDF introduces a novel attention mechanism that operates across both feature and modality dimensions to compute reliable attention scores, effectively reflecting importance at each level. The CL-DMDF further incorporates an entity-centroid contrastive learning module that constructs centroid-based positive samples from entity features to enhance discriminative learning. Additionally, an adaptive fusion module is employed to improve the efficiency and accuracy of dynamic fusion strategies. Extensive experiments conducted on three datasets demonstrate the effectiveness of the CL-DMDF across diverse multimodal fusion tasks.
Problem

Research questions and friction points this paper is trying to address.

multimodal data fusion
missing modalities
uncertain inputs
global complementary cues
structured multimodal inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Learning
Dynamic Multimodal Fusion
Attention Mechanism
Centroid-based Representation
Adaptive Fusion
๐Ÿ”Ž Similar Papers