Drift-Adapter: A Practical Approach to Near Zero-Downtime Embedding Model Upgrades in Vector Databases

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vector database upgrades requiring embedding model replacement incur prohibitive computational overhead and service disruption due to full re-encoding of all vectors and reconstruction of approximate nearest neighbor (ANN) indices. This paper proposes Drift-Adapter, a lightweight, learnable space-alignment method that inserts a parameterized adapter layer—implementing orthogonal Procrustes alignment, low-rank affine transformation, or residual MLP—between old and new embedding spaces. Trained on only a small set of paired old–new embeddings, it enables online calibration without index rebuilding. Its core contribution is enabling *hot embedding model upgrades*: real-time query-time mapping with zero downtime. Experiments on billion-scale systems demonstrate that Drift-Adapter restores 95%–99% of the recall achieved by full re-encoding, incurs less than 10 μs added query latency, and reduces recomputation cost by over two orders of magnitude.

Technology Category

Application Category

📝 Abstract
Upgrading embedding models in production vector databases typically requires re-encoding the entire corpus and rebuilding the Approximate Nearest Neighbor (ANN) index, leading to significant operational disruption and computational cost. This paper presents Drift-Adapter, a lightweight, learnable transformation layer designed to bridge embedding spaces between model versions. By mapping new queries into the legacy embedding space, Drift-Adapter enables the continued use of the existing ANN index, effectively deferring full re-computation. We systematically evaluate three adapter parameterizations: Orthogonal Procrustes, Low-Rank Affine, and a compact Residual MLP, trained on a small sample of paired old and new embeddings. Experiments on MTEB text corpora and a CLIP image model upgrade (1M items) show that Drift-Adapter recovers 95-99% of the retrieval recall (Recall@10, MRR) of a full re-embedding, adding less than 10 microseconds of query latency. Compared to operational strategies like full re-indexing or dual-index serving, Drift-Adapter reduces recompute costs by over 100 times and facilitates upgrades with near-zero operational interruption. We analyze robustness to varied model drift, training data size, scalability to billion-item systems, and the impact of design choices like diagonal scaling, demonstrating Drift-Adapter's viability as a pragmatic solution for agile model deployment.
Problem

Research questions and friction points this paper is trying to address.

Reducing operational disruption during embedding model upgrades
Avoiding complete corpus re-encoding and ANN index rebuilding
Minimizing computational costs while maintaining retrieval performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight transformation layer bridges embedding spaces
Maps new queries to use existing ANN index
Reduces recompute costs by over 100 times
🔎 Similar Papers
No similar papers found.