Vector Linking via Cross-Model Local Isometric Consistency

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the challenge of recovering cross-model object correspondences when given only two partially overlapping embedding datasets produced by distinct black-box encoders. It identifies and leverages, for the first time, the cross-model isometric consistency in local geometric structures induced by contrastive learning encoders, and proposes a training-free, iterative geometric embedding hashing method. Guided by a small set of seed anchor points, the approach progressively expands correspondences across views by integrating distance-based hashing with Beta-Bernoulli Bayesian posterior aggregation. Experiments demonstrate that the method achieves high-precision and robust vector alignment across diverse encoder pairs, overlap ratios, and anchor conditions, successfully enabling applications such as vector database fusion and cross-model clustering.
📝 Abstract
We study Vector Linking: given two embedding clouds produced by different black-box encoders over partially overlapping datasets, recover cross-model object correspondences using only vectors. Empirically and theoretically, we show that independently trained contrastive encoders exhibit local geometric consistency: short-range distances are approximately preserved up to a scale factor, while long-range distances are not due to model-specific distortion. Building on this, we propose an iterative, reference-based geometric embedding hashing that recovers vector links from a tiny seed set of paired anchors. It represents each vector by distances to sampled paired anchors, proposes candidate links via hash-space matching, and aggregates evidence across views in a Beta-Bernoulli posterior to bootstrap high-confidence links as new anchors. Experiments across multiple benchmarks and embedding model pairs demonstrate accurate and robust linking under varying overlap, seed budgets, and out-of-domain anchors, with applications to vector database integration and cross-model clustering. Code is available at https://github.com/DBgroup-Edinburgh/VecLinking.
Problem

Research questions and friction points this paper is trying to address.

Vector Linking
Cross-Model Correspondence
Embedding Alignment
Black-box Encoders
Geometric Consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vector Linking
Local Isometric Consistency
Geometric Embedding Hashing
Cross-Model Correspondence
Contrastive Encoders
🔎 Similar Papers
No similar papers found.