๐ค AI Summary
To address the reliance of conditional text embeddings on extensive labeled data and model fine-tuning, this paper proposes PonTEโa fully unsupervised, zero-training method. Leveraging causal large language models, PonTE employs carefully designed conditional prompting to directly generate perspective-aware text embeddings, requiring neither parameter updates nor annotated data. It is the first approach to achieve truly zero-fine-tuning conditional embedding generation. Empirically, PonTE matches supervised methods in semantic similarity estimation and text clustering tasks. Moreover, through embedding visualization and post-prompt token analysis, it demonstrates significantly enhanced interpretability of conditional semantics. This work breaks the traditional supervised paradigmโs dependence on labeled resources, establishing a novel framework for controllable text representation learning in low-resource settings.
๐ Abstract
Conditional text embedding is a proposed representation that captures the shift in perspective on texts when conditioned on a specific aspect. Previous methods have relied on extensive training data for fine-tuning models, leading to challenges in terms of labor and resource costs. We propose PonTE, a novel unsupervised conditional text embedding method that leverages a causal large language model and a conditional prompt. Through experiments on conditional semantic text similarity and text clustering, we demonstrate that PonTE can generate useful conditional text embeddings and achieve performance comparable to supervised methods without fine-tuning. We also show the interpretability of text embeddings with PonTE by analyzing word generation following prompts and embedding visualization.