MAJORScore: A Novel Metric for Evaluating Multimodal Relevance via Joint Representation

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal relevance evaluation metrics are limited to bimodal settings and fail to capture higher-order (≥3) joint correlations, thereby compromising the accuracy and fairness of multimodal similarity modeling. To address this, we propose MAJORScore—the first unified relevance evaluation framework for N-modal (N ≥ 3) data. It leverages a pretrained contrastive learning model to construct a shared multimodal joint representation space, enabling cross-modal consistency modeling. Its core innovation lies in mapping heterogeneous modalities into a common latent space to directly quantify high-order joint relevance. Experiments demonstrate that MAJORScore improves scores by 26.03%–64.29% under modality-consistent conditions and reduces them by 13.28%–20.54% under inconsistency—significantly enhancing reliability and discriminative power. As a scalable, standardized benchmark, MAJORScore supports robust evaluation of large-scale multimodal datasets and models.

Technology Category

Application Category

📝 Abstract
The multimodal relevance metric is usually borrowed from the embedding ability of pretrained contrastive learning models for bimodal data, which is used to evaluate the correlation between cross-modal data (e.g., CLIP). However, the commonly used evaluation metrics are only suitable for the associated analysis between two modalities, which greatly limits the evaluation of multimodal similarity. Herein, we propose MAJORScore, a brand-new evaluation metric for the relevance of multiple modalities (N modalities, N>=3) via multimodal joint representation for the first time. The ability of multimodal joint representation to integrate multiple modalities into the same latent space can accurately represent different modalities at one scale, providing support for fair relevance scoring. Extensive experiments have shown that MAJORScore increases by 26.03%-64.29% for consistent modality and decreases by 13.28%-20.54% for inconsistence compared to existing methods. MAJORScore serves as a more reliable metric for evaluating similarity on large-scale multimodal datasets and multimodal model performance evaluation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating multimodal relevance beyond two modalities limitation
Proposing joint representation metric for N modalities (N>=3)
Providing fair relevance scoring across multiple modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MAJORScore for multimodal relevance evaluation
Uses joint representation to integrate multiple modalities
Enables fair scoring across three or more modalities
🔎 Similar Papers
No similar papers found.
Z
Zhicheng Du
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Q
Qingyang Shi
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
J
Jiasheng Lu
Huawei Technologies Co., Ltd., Shenzhen, China
Y
Yingshan Liang
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
X
Xinyu Zhang
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Y
Yiran Wang
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Peiwu Qin
Peiwu Qin
Tsinghua Shenzhen International Graduate School
Image ProcessingTCM