Collaboration of Fusion and Independence: Hypercomplex-driven Robust Multi-Modal Knowledge Graph Completion

πŸ“… 2025-09-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing multimodal knowledge graph completion (MKGC) approaches face a fundamental trade-off: fusion-based methods lose modality-specific information due to fixed aggregation strategies, while ensemble-based methods struggle to model context-aware, fine-grained cross-modal interactions. To address this, we propose M-Hyperβ€”a novel hypercomplex representation framework that unifies textual, visual, audio, and structural modalities using dual quaternions, enabling simultaneous preservation of modality independence and dynamic, context-sensitive interaction modeling within a single algebraic framework. We introduce the FERF entity decomposition module and the R2MF relation-aware robust fusion module, both leveraging the Hamilton product to realize differentiable, computationally efficient, and noise-resilient multimodal interaction. Extensive experiments on multiple benchmark datasets demonstrate that M-Hyper significantly outperforms state-of-the-art methods in accuracy, while achieving lower computational overhead and superior robustness to input noise.

Technology Category

Application Category

πŸ“ Abstract
Multi-modal knowledge graph completion (MMKGC) aims to discover missing facts in multi-modal knowledge graphs (MMKGs) by leveraging both structural relationships and diverse modality information of entities. Existing MMKGC methods follow two multi-modal paradigms: fusion-based and ensemble-based. Fusion-based methods employ fixed fusion strategies, which inevitably leads to the loss of modality-specific information and a lack of flexibility to adapt to varying modality relevance across contexts. In contrast, ensemble-based methods retain modality independence through dedicated sub-models but struggle to capture the nuanced, context-dependent semantic interplay between modalities. To overcome these dual limitations, we propose a novel MMKGC method M-Hyper, which achieves the coexistence and collaboration of fused and independent modality representations. Our method integrates the strengths of both paradigms, enabling effective cross-modal interactions while maintaining modality-specific information. Inspired by ``quaternion'' algebra, we utilize its four orthogonal bases to represent multiple independent modalities and employ the Hamilton product to efficiently model pair-wise interactions among them. Specifically, we introduce a Fine-grained Entity Representation Factorization (FERF) module and a Robust Relation-aware Modality Fusion (R2MF) module to obtain robust representations for three independent modalities and one fused modality. The resulting four modality representations are then mapped to the four orthogonal bases of a biquaternion (a hypercomplex extension of quaternion) for comprehensive modality interaction. Extensive experiments indicate its state-of-the-art performance, robustness, and computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of fixed fusion and independent ensemble methods
Enabling collaborative representation of fused and independent modalities
Modeling nuanced cross-modal interactions while preserving modality-specific information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypercomplex algebra enables multi-modal collaboration
Biquaternion mapping captures orthogonal modality interactions
FERF and R2MF modules maintain fused and independent representations
Z
Zhiqiang Liu
Zhejiang University, ZJU-Ant Group Joint Lab of Knowledge Graph
Y
Yichi Zhang
Zhejiang University, ZJU-Ant Group Joint Lab of Knowledge Graph
Mengshu Sun
Mengshu Sun
Beijing University of Technology
Deep LearningModel Compression and Acceleration
Lei Liang
Lei Liang
Ant Group
Knowledge GraphAI
W
Wen Zhang
Zhejiang University, ZJU-Ant Group Joint Lab of Knowledge Graph