Do We Really Need Specialization? Evaluating Generalist Text Embeddings for Zero-Shot Recommendation and Search

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the prevailing assumption that text embedding models require task- or domain-specific fine-tuning to perform well on downstream applications. We systematically evaluate general-purpose text embedders (GTEs) in zero-shot recommendation and search tasks. Methodologically, we propose a lightweight, parameter-free adaptation framework grounded in pretrained language models: without updating any parameters, we apply unsupervised Principal Component Analysis (PCA) to the original embeddings for dimensionality reduction, preserving the most discriminative semantic directions while suppressing noise and calibrating feature distributions. Experiments demonstrate that this zero-shot approach significantly outperforms fine-tuned task-specific models on both sequential recommendation and e-commerce search benchmarks. Crucially, we identify uniform embedding distribution as a key mechanism underlying improved generalization. Our findings establish a new paradigm for large-model representation transfer, empirically confirming that high-quality general-purpose embeddings inherently possess strong task adaptability.

Technology Category

Application Category

📝 Abstract
Pre-trained language models (PLMs) are widely used to derive semantic representations from item metadata in recommendation and search. In sequential recommendation, PLMs enhance ID-based embeddings through textual metadata, while in product search, they align item characteristics with user intent. Recent studies suggest task and domain-specific fine-tuning are needed to improve representational power. This paper challenges this assumption, showing that Generalist Text Embedding Models (GTEs), pre-trained on large-scale corpora, can guarantee strong zero-shot performance without specialized adaptation. Our experiments demonstrate that GTEs outperform traditional and fine-tuned models in both sequential recommendation and product search. We attribute this to a superior representational power, as they distribute features more evenly across the embedding space. Finally, we show that compressing embedding dimensions by focusing on the most informative directions (e.g., via PCA) effectively reduces noise and improves the performance of specialized models. To ensure reproducibility, we provide our repository at https://split.to/gte4ps.
Problem

Research questions and friction points this paper is trying to address.

Evaluating generalist text embeddings for zero-shot recommendation and search
Challenging the need for task-specific fine-tuning in embeddings
Improving performance via embedding compression and noise reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalist Text Embeddings ensure zero-shot performance
GTEs outperform fine-tuned models in recommendations
PCA compression reduces noise in embeddings
🔎 Similar Papers
No similar papers found.