Resource-Efficient Adaptation of Large Language Models for Text Embeddings via Prompt Engineering and Contrastive Fine-tuning

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address the lack of high-quality, controllable sentence-level embeddings from large language models (LLMs) for non-generative tasks—such as clustering, classification, and retrieval—this paper proposes a unified framework integrating prompt engineering, contrastive fine-tuning, and semantic-aware aggregation. Specifically, it introduces task-oriented prompt templates, synthesizes positive pairs to drive contrastive learning, and employs attention-based token-level vector weighting for sentence embedding aggregation—thereby preserving salient semantics while suppressing noise. The method requires only lightweight fine-tuning of decoder-only LLMs and is validated via attention analysis to confirm enhanced semantic focus. Evaluated on the MTEB English clustering benchmark, it achieves state-of-the-art performance, significantly outperforming mainstream embedding models. Results demonstrate the framework’s effectiveness, robustness, and practical utility for sentence embedding generation in non-generative downstream applications.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have become a cornerstone in Natural Language Processing (NLP), achieving impressive performance in text generation. Their token-level representations capture rich, human-aligned semantics. However, pooling these vectors into a text embedding discards crucial information. Nevertheless, many non-generative downstream tasks, such as clustering, classification, or retrieval, still depend on accurate and controllable sentence- or document-level embeddings. We explore several adaptation strategies for pre-trained, decoder-only LLMs: (i) various aggregation techniques for token embeddings, (ii) task-specific prompt engineering, and (iii) text-level augmentation via contrastive fine-tuning. Combining these components yields state-of-the-art performance on the English clustering track of the Massive Text Embedding Benchmark (MTEB). An analysis of the attention map further shows that fine-tuning shifts focus from prompt tokens to semantically relevant words, indicating more effective compression of meaning into the final hidden state. Our experiments demonstrate that LLMs can be effectively adapted as text embedding models through a combination of prompt engineering and resource-efficient contrastive fine-tuning on synthetically generated positive pairs.

Problem

Research questions and friction points this paper is trying to address.

Improving text embeddings from token-level LLM representations

Adapting LLMs for non-generative tasks via efficient methods

Enhancing semantic compression in embeddings via contrastive fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token embedding aggregation techniques

Task-specific prompt engineering

Contrastive fine-tuning on synthetic pairs

🔎 Similar Papers

No similar papers found.