Multi-modal Knowledge Graph Generation with Semantics-enriched Prompts

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the scarcity of multimodal knowledge graphs (MMKGs) and the low quality of image–entity associations, this paper proposes an automatic framework for constructing high-quality MMKGs from unimodal knowledge graphs. The core innovation is the Visualized Structured Neighbor Selection (VSNS) method, which—novelty—decouples visuality-aware relation filtering (VNS) from structural neighborhood selection (SNS), enabling knowledge-context-driven image selection and generation. VSNS integrates knowledge graph embedding, relation-level visuality discrimination, neighborhood importance scoring, and multimodal prompt engineering to guide diffusion-based image generation. Experiments on MKG-Y and DB15K demonstrate that VSNS significantly improves both semantic relevance and structural consistency of generated images. Quantitative evaluations—including image–entity alignment metrics—and qualitative analyses consistently outperform existing baselines, validating the effectiveness of our approach in bridging unimodal knowledge graphs with rich, contextually grounded visual content.

Technology Category

Application Category

📝 Abstract

Multi-modal Knowledge Graphs (MMKGs) have been widely applied across various domains for knowledge representation. However, the existing MMKGs are significantly fewer than required, and their construction faces numerous challenges, particularly in ensuring the selection of high-quality, contextually relevant images for knowledge graph enrichment. To address these challenges, we present a framework for constructing MMKGs from conventional KGs. Furthermore, to generate higher-quality images that are more relevant to the context in the given knowledge graph, we designed a neighbor selection method called Visualizable Structural Neighbor Selection (VSNS). This method consists of two modules: Visualizable Neighbor Selection (VNS) and Structural Neighbor Selection (SNS). The VNS module filters relations that are difficult to visualize, while the SNS module selects neighbors that most effectively capture the structural characteristics of the entity. To evaluate the quality of the generated images, we performed qualitative and quantitative evaluations on two datasets, MKG-Y and DB15K. The experimental results indicate that using the VSNS method to select neighbors results in higher-quality images that are more relevant to the knowledge graph.

Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of high-quality Multi-modal Knowledge Graphs (MMKGs)

Improves image relevance in MMKGs via Visualizable Structural Neighbor Selection (VSNS)

Evaluates image quality using MKG-Y and DB15K datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for constructing MMKGs from conventional KGs

Visualizable Structural Neighbor Selection (VSNS) method

Qualitative and quantitative evaluations on MKG-Y and DB15K

🔎 Similar Papers

UniRAG: Universal Retrieval Augmentation for Large Vision Language Models