MA-DPR: Manifold-aware Distance Metrics for Dense Passage Retrieval

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional dense passage retrieval (DPR) relies on Euclidean or cosine distance, assuming embeddings lie on a linear manifold—yet in practice, especially under out-of-distribution (OOD) conditions, embeddings often reside on low-dimensional nonlinear manifolds, undermining semantic similarity modeling. Method: This work introduces manifold geometry to DPR for the first time, proposing a training-free, manifold-aware distance metric: it constructs a k-nearest-neighbor graph from pretrained embeddings and approximates the intrinsic geodesic distance via shortest-path lengths on the graph, thereby capturing latent semantic relationships between queries and passages. Contribution/Results: The method is model-agnostic, enables zero-shot transfer, and supports efficient inference. Experiments show a 26% absolute improvement in Recall@10 on OOD retrieval tasks, while maintaining in-distribution performance and incurring only negligible query latency—significantly enhancing robustness and generalization across domains.

Technology Category

Application Category

📝 Abstract
Dense Passage Retrieval (DPR) typically relies on Euclidean or cosine distance to measure query-passage relevance in embedding space, which is effective when embeddings lie on a linear manifold. However, our experiments across DPR benchmarks suggest that embeddings often lie on lower-dimensional, non-linear manifolds, especially in out-of-distribution (OOD) settings, where cosine and Euclidean distance fail to capture semantic similarity. To address this limitation, we propose a manifold-aware distance metric for DPR (MA-DPR) that models the intrinsic manifold structure of passages using a nearest neighbor graph and measures query-passage distance based on their shortest path in this graph. We show that MA-DPR outperforms Euclidean and cosine distances by up to 26% on OOD passage retrieval with comparable in-distribution performance across various embedding models while incurring a minimal increase in query inference time. Empirical evidence suggests that manifold-aware distance allows DPR to leverage context from related neighboring passages, making it effective even in the absence of direct semantic overlap. MADPR can be applied to a wide range of dense embedding and retrieval tasks, offering potential benefits across a wide spectrum of domains.
Problem

Research questions and friction points this paper is trying to address.

Addresses non-linear embedding manifolds in dense retrieval
Improves out-of-distribution semantic similarity measurement
Models intrinsic passage relationships using graph-based distances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Manifold-aware distance metric for DPR
Models intrinsic manifold structure using graph
Measures distance via shortest path in graph
🔎 Similar Papers
No similar papers found.
Y
Yifan Liu
University of Toronto, Canada
Q
Qianfeng Wen
University of Toronto, Canada
Mark Zhao
Mark Zhao
University of Colorado Boulder
Computer SystemsSystems for MLCloud Computing
J
Jiazhou Liang
University of Toronto, Canada
Scott Sanner
Scott Sanner
University of Toronto
Artificial IntelligenceMachine LearningInformation Retrieval