Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

129K/year

🤖 AI Summary

Traditional multimodal relation extraction (MRE) relies on discrete classification paradigms, neglecting structural constraints—such as entity types and relative positions—and struggling to capture fine-grained semantic relations. To address these limitations, we propose a novel *retrieval-based relation extraction* paradigm, reformulating relation identification as a natural language description-driven semantic matching task. Our approach incorporates entity types and relative positional information as explicit structural constraints, leverages large language models to generate fine-grained, descriptive relation texts, and employs a multimodal encoder with contrastive learning to achieve cross-modal semantic alignment. This design significantly enhances model capability in modeling complex and ambiguous relations, while improving robustness and interpretability. Extensive experiments demonstrate state-of-the-art performance on the MNRE and MORE benchmarks, consistently outperforming existing classification-based MRE methods across all evaluation metrics.

Technology Category

Application Category

📝 Abstract

Relation extraction (RE) aims to identify semantic relations between entities in unstructured text. Although recent work extends traditional RE to multimodal scenarios, most approaches still adopt classification-based paradigms with fused multimodal features, representing relations as discrete labels. This paradigm has two significant limitations: (1) it overlooks structural constraints like entity types and positional cues, and (2) it lacks semantic expressiveness for fine-grained relation understanding. We propose underline{R}etrieval underline{O}ver underline{C}lassification (ROC), a novel framework that reformulates multimodal RE as a retrieval task driven by relation semantics. ROC integrates entity type and positional information through a multimodal encoder, expands relation labels into natural language descriptions using a large language model, and aligns entity-relation pairs via semantic similarity-based contrastive learning. Experiments show that our method achieves state-of-the-art performance on the benchmark datasets MNRE and MORE and exhibits stronger robustness and interpretability.

Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of classification-based multimodal relation extraction approaches

Addressing lack of structural constraints and semantic expressiveness in RE

Reformulating relation extraction as semantic-driven retrieval task

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates relation extraction as retrieval task

Expands relation labels into natural language descriptions

Aligns entity-relation pairs via contrastive learning

🔎 Similar Papers

No similar papers found.