Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking

📅 2024-04-12

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing contrastive learning frameworks rely solely on binary relevance signals (positive/negative pairs), failing to capture fine-grained ranking information—thus limiting retrieval ranking performance and necessitating external re-rankers. This work proposes Generalized Contrastive Learning (GCL), the first framework to directly incorporate continuous relevance scores into the contrastive objective, enabling end-to-end multimodal retrieval and ranking in a unified manner. Key contributions include: (1) a ranking-weighted contrastive loss that explicitly models graded relevance; (2) MarqoGS-10M—the first million-scale multimodal dataset with human-verified continuous relevance scores; and (3) high-quality synthetic training data generated via CLIP-based pipelines augmented with GPT-4 and Google Shopping. Experiments demonstrate significant improvements: +29.3% NDCG@10 on standard benchmarks, +6.0–10.0% in cold-start settings, and +11.2% on private user-behavior data.

Technology Category

Application Category

📝 Abstract

Contrastive learning has gained widespread adoption for retrieval tasks due to its minimal requirement for manual annotations. However, popular training frameworks typically learn from binary (positive/negative) relevance, making them ineffective at incorporating desired rankings. As a result, the poor ranking performance of these models forces systems to employ a re-ranker, which increases complexity, maintenance effort and inference time. To address this, we introduce Generalized Contrastive Learning (GCL), a training framework designed to learn from continuous ranking scores beyond binary relevance. GCL encodes both relevance and ranking information into a unified embedding space by applying ranking scores to the loss function. This enables a single-stage retrieval system. In addition, during our research, we identified a lack of public multi-modal datasets that benchmark both retrieval and ranking capabilities. To facilitate this and future research for ranked retrieval, we curated a large-scale MarqoGS-10M dataset using GPT-4 and Google Shopping, providing ranking scores for each of the 10 million query-document pairs. Our results show that GCL achieves a 29.3% increase in NDCG@10 for in-domain evaluations and 6.0% to 10.0% increases for cold-start evaluations compared to the finetuned CLIP baseline with MarqoGS-10M. Additionally, we evaluated GCL offline on a proprietary user interaction data. GCL shows an 11.2% gain for in-domain evaluations. The dataset and the method are available at: https://github.com/marqo-ai/GCL.

Problem

Research questions and friction points this paper is trying to address.

Improves ranking in retrieval using generalized contrastive learning

Addresses lack of multi-modal datasets for ranked retrieval

Reduces system complexity by eliminating re-ranking stages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Contrastive Learning for unified embedding

Single-stage retrieval with ranking loss function

Large-scale MarqoGS-10M dataset for multi-modal ranking

🔎 Similar Papers

No similar papers found.

Authors to Follow