🤖 AI Summary
This work addresses the problem of group-fair k-nearest neighbor (k-NN) queries in vector databases under joint constraints involving multiple sensitive attributes. It proposes the first computational framework that simultaneously achieves high efficiency, scalability, and recall. The approach accelerates candidate generation via an enhanced locality-sensitive hashing (LSH) scheme, constructs a lightweight index over attribute combinations, and enforces multi-attribute fairness constraints during post-processing: employing a network flow algorithm for two sensitive attributes and a theoretically grounded integer linear programming (ILP) formulation for three or more. Experimental results demonstrate that existing methods struggle to enforce such multi-attribute fairness directly, whereas the proposed framework significantly improves query efficiency and scalability while maintaining high recall.
📝 Abstract
We initiate the study of multi-attribute group fairness in $k$-nearest neighbor ($k$-NN) search over vector databases. Unlike prior work that optimizes efficiency or query filtering, fairness imposes count constraints to ensure proportional representation across groups defined by protected attributes. When fairness spans multiple attributes, these constraints must be satisfied simultaneously, making the problem computationally hard. To address this, we propose a computational framework that produces high-quality approximate nearest neighbors with good trade-offs between search time, memory/indexing cost, and recall. We adapt locality-sensitive hashing (LSH) to accelerate candidate generation and build a lightweight index over the Cartesian product of protected attribute values. Our framework retrieves candidates satisfying joint count constraints and then applies a post-processing stage to construct fair $k$-NN results across all attributes. For 2 attributes, we present an exact polynomial-time flow-based algorithm; for 3 or more, we formulate ILP-based exact solutions with higher computational cost. We provide theoretical guarantees, identify efficiency--fairness trade-offs, and empirically show that existing vector search methods cannot be directly adapted for fairness. Experimental evaluations demonstrate the generality of the proposed framework and scalability.