Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional multimodal relation extraction (MRE) relies on discrete classification paradigms, neglecting structural constraints—such as entity types and relative positions—and struggling to capture fine-grained semantic relations. To address these limitations, we propose a novel *retrieval-based relation extraction* paradigm, reformulating relation identification as a natural language description-driven semantic matching task. Our approach incorporates entity types and relative positional information as explicit structural constraints, leverages large language models to generate fine-grained, descriptive relation texts, and employs a multimodal encoder with contrastive learning to achieve cross-modal semantic alignment. This design significantly enhances model capability in modeling complex and ambiguous relations, while improving robustness and interpretability. Extensive experiments demonstrate state-of-the-art performance on the MNRE and MORE benchmarks, consistently outperforming existing classification-based MRE methods across all evaluation metrics.

Technology Category

Application Category

📝 Abstract
Relation extraction (RE) aims to identify semantic relations between entities in unstructured text. Although recent work extends traditional RE to multimodal scenarios, most approaches still adopt classification-based paradigms with fused multimodal features, representing relations as discrete labels. This paradigm has two significant limitations: (1) it overlooks structural constraints like entity types and positional cues, and (2) it lacks semantic expressiveness for fine-grained relation understanding. We propose underline{R}etrieval underline{O}ver underline{C}lassification (ROC), a novel framework that reformulates multimodal RE as a retrieval task driven by relation semantics. ROC integrates entity type and positional information through a multimodal encoder, expands relation labels into natural language descriptions using a large language model, and aligns entity-relation pairs via semantic similarity-based contrastive learning. Experiments show that our method achieves state-of-the-art performance on the benchmark datasets MNRE and MORE and exhibits stronger robustness and interpretability.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of classification-based multimodal relation extraction approaches
Addressing lack of structural constraints and semantic expressiveness in RE
Reformulating relation extraction as semantic-driven retrieval task
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates relation extraction as retrieval task
Expands relation labels into natural language descriptions
Aligns entity-relation pairs via contrastive learning
🔎 Similar Papers
No similar papers found.
L
Lei Hei
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
T
Tingjing Liao
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Y
Yingxin Pei
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Y
Yiyang Qi
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
J
Jiaqi Wang
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
R
Ruiting Li
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Feiliang Ren
Feiliang Ren
Northeastern University
machine translationtext mining