🤖 AI Summary
To address the lack of high-quality, controllable sentence-level embeddings from large language models (LLMs) for non-generative tasks—such as clustering, classification, and retrieval—this paper proposes a unified framework integrating prompt engineering, contrastive fine-tuning, and semantic-aware aggregation. Specifically, it introduces task-oriented prompt templates, synthesizes positive pairs to drive contrastive learning, and employs attention-based token-level vector weighting for sentence embedding aggregation—thereby preserving salient semantics while suppressing noise. The method requires only lightweight fine-tuning of decoder-only LLMs and is validated via attention analysis to confirm enhanced semantic focus. Evaluated on the MTEB English clustering benchmark, it achieves state-of-the-art performance, significantly outperforming mainstream embedding models. Results demonstrate the framework’s effectiveness, robustness, and practical utility for sentence embedding generation in non-generative downstream applications.
📝 Abstract
Large Language Models (LLMs) have become a cornerstone in Natural Language Processing (NLP), achieving impressive performance in text generation. Their token-level representations capture rich, human-aligned semantics. However, pooling these vectors into a text embedding discards crucial information. Nevertheless, many non-generative downstream tasks, such as clustering, classification, or retrieval, still depend on accurate and controllable sentence- or document-level embeddings. We explore several adaptation strategies for pre-trained, decoder-only LLMs: (i) various aggregation techniques for token embeddings, (ii) task-specific prompt engineering, and (iii) text-level augmentation via contrastive fine-tuning. Combining these components yields state-of-the-art performance on the English clustering track of the Massive Text Embedding Benchmark (MTEB). An analysis of the attention map further shows that fine-tuning shifts focus from prompt tokens to semantically relevant words, indicating more effective compression of meaning into the final hidden state. Our experiments demonstrate that LLMs can be effectively adapted as text embedding models through a combination of prompt engineering and resource-efficient contrastive fine-tuning on synthetically generated positive pairs.