đ¤ AI Summary
This study investigates whether visual embedding models implicitly encode linearly sortable structures for continuous ordinal attributesâsuch as age, crowd density, head pose, aesthetic quality, and timelinessâwithin their latent embedding spaces.
Method: We introduce the concept of âsorting axesâ: discriminative linear directions in embedding space that capture ordinal relationships. These axes are recovered in an unsupervised manner using only a minimal number of samplesâeven as few as two endpoint examplesâand quantified via projection consistency and order-preservation metrics.
Contribution/Results: Evaluating seven mainstream visual encoders across nine benchmark datasets, we provide the first systematic empirical validation that widely adopted pre-trained models inherently exhibit strong ordinal sortability. This reveals a previously underappreciated geometric property of embedding spacesânamely, their intrinsic alignment with continuous ordinal semantics. The finding establishes a new paradigm for zero-shot image ranking, vector-space retrieval optimization, and semantic-controllable generation, offering both practical methodology and theoretical grounding for leveraging ordinal structure in vision-language models.
đ Abstract
We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term _rank axes_. We define a model as _rankable_ for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings. Our code is available at https://github.com/aktsonthalia/rankable-vision-embeddings.