HERGC: Heterogeneous Experts Representation and Generative Completion for Multimodal Knowledge Graphs

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Multimodal knowledge graphs (MMKGs) suffer from inherent incompleteness, while existing discriminative completion methods are constrained by the closed-world assumption and limited reasoning capacity. To address these limitations, we propose the first generative multimodal knowledge graph completion (MMKGC) framework. Our method integrates a heterogeneous expert retriever with a lightweight instruction-tuned large language model (LLM), leveraging heterogeneous graph neural networks, cross-modal attention fusion, and retrieval-augmented generation (RAG) to achieve deep alignment of heterogeneous multimodal data (e.g., images and text) and open-world fact generation. This work pioneers the systematic adoption of the generative paradigm for MMKGC. Evaluated on three standard benchmarks, our framework achieves state-of-the-art performance—significantly improving mean reciprocal rank (MRR) in link prediction—while demonstrating strong robustness and efficient convergence under few-shot settings.

Technology Category

Application Category

📝 Abstract

Multimodal knowledge graphs (MMKGs) enrich traditional knowledge graphs (KGs) by incorporating diverse modalities such as images and text. Multi-modal knowledge graph completion (MMKGC) seeks to exploit these heterogeneous signals to infer missing facts, thereby mitigating the intrinsic incompleteness of MMKGs. Existing MMKGC methods typically leverage only the information contained in the MMKGs under the closed-world assumption and adopt discriminative training objectives, which limits their reasoning capacity during completion. Recent generative completion approaches powered by advanced large language models (LLMs) have shown strong reasoning abilities in unimodal knowledge graph completion, but their potential in MMKGC remains largely unexplored. To bridge this gap, we propose HERGC, a Heterogeneous Experts Representation and Generative Completion framework for MMKGs. HERGC first deploys a Heterogeneous Experts Representation Retriever that enriches and fuses multimodal information and retrieves a compact candidate set for each incomplete triple. It then uses a Generative LLM Predictor fine-tuned on minimal instruction data to accurately identify the correct answer from these candidates. Extensive experiments on three standard MMKG benchmarks demonstrate HERGC's effectiveness and robustness, achieving state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses incompleteness in multimodal knowledge graphs (MMKGs)

Enhances reasoning by integrating generative LLMs with multimodal data

Improves accuracy in inferring missing facts across diverse modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous Experts Representation Retriever fuses multimodal information

Generative LLM Predictor fine-tuned on minimal data

Combines retrieval and generative LLMs for MMKGC

🔎 Similar Papers

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning