π€ AI Summary
Existing e-commerce visual search systems suffer from limited generalization and scalability due to category-coupled detection-classification pipelines and reliance on noisy labels. This work proposes a category-decoupled visual search architecture that employs category-agnostic region proposals and a unified embedding space for similarity retrieval. To eliminate dependence on manual annotations or noisy catalog data, we introduce a large language modelβbased zero-shot evaluation mechanism (LLM-as-a-Judge). The proposed approach significantly enhances system robustness and generalization. Upon large-scale deployment on a real-world home furnishings e-commerce platform, it yields substantial improvements in retrieval quality and user engagement, with offline evaluation metrics showing strong alignment with online performance.
π Abstract
Visual search is critical for e-commerce, especially in style-driven domains where user intent is subjective and open-ended. Existing industrial systems typically couple object detection with taxonomy-based classification and rely on catalog data for evaluation, which is prone to noise that limits robustness and scalability. We propose a taxonomy-decoupled architecture that uses classification-free region proposals and unified embeddings for similarity retrieval, enabling a more flexible and generalizable visual search. To overcome the evaluation bottleneck, we propose an LLM-as-a-Judge framework that assesses nuanced visual similarity and category relevance for query-result pairs in a zero-shot manner, removing dependence on human annotations or noise-prone catalog data. Deployed at scale on a global home goods platform, our system improves retrieval quality and yields a measurable uplift in customer engagement, while our offline evaluation metrics strongly correlate with real-world outcomes.