MAJORScore: A Novel Metric for Evaluating Multimodal Relevance via Joint Representation

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing multimodal relevance evaluation metrics are limited to bimodal settings and fail to capture higher-order (≥3) joint correlations, thereby compromising the accuracy and fairness of multimodal similarity modeling. To address this, we propose MAJORScore—the first unified relevance evaluation framework for N-modal (N ≥ 3) data. It leverages a pretrained contrastive learning model to construct a shared multimodal joint representation space, enabling cross-modal consistency modeling. Its core innovation lies in mapping heterogeneous modalities into a common latent space to directly quantify high-order joint relevance. Experiments demonstrate that MAJORScore improves scores by 26.03%–64.29% under modality-consistent conditions and reduces them by 13.28%–20.54% under inconsistency—significantly enhancing reliability and discriminative power. As a scalable, standardized benchmark, MAJORScore supports robust evaluation of large-scale multimodal datasets and models.

Technology Category

Application Category

📝 Abstract

The multimodal relevance metric is usually borrowed from the embedding ability of pretrained contrastive learning models for bimodal data, which is used to evaluate the correlation between cross-modal data (e.g., CLIP). However, the commonly used evaluation metrics are only suitable for the associated analysis between two modalities, which greatly limits the evaluation of multimodal similarity. Herein, we propose MAJORScore, a brand-new evaluation metric for the relevance of multiple modalities (N modalities, N>=3) via multimodal joint representation for the first time. The ability of multimodal joint representation to integrate multiple modalities into the same latent space can accurately represent different modalities at one scale, providing support for fair relevance scoring. Extensive experiments have shown that MAJORScore increases by 26.03%-64.29% for consistent modality and decreases by 13.28%-20.54% for inconsistence compared to existing methods. MAJORScore serves as a more reliable metric for evaluating similarity on large-scale multimodal datasets and multimodal model performance evaluation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating multimodal relevance beyond two modalities limitation

Proposing joint representation metric for N modalities (N>=3)

Providing fair relevance scoring across multiple modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MAJORScore for multimodal relevance evaluation

Uses joint representation to integrate multiple modalities

Enables fair scoring across three or more modalities

🔎 Similar Papers

No similar papers found.

Authors to Follow