🤖 AI Summary
E-commerce search relevance modeling faces two key challenges: semantic gap between queries and items, and scarcity of domain-specific hard negative samples. To address these, we propose a three-module collaborative framework: (1) a chain-of-thought large language model that automatically generates high-quality training data with intent alignment and behavioral consistency; (2) error-type-aware adversarial sample synthesis to enhance model robustness; and (3) knowledge distillation incorporating hierarchical item critical attributes for lightweight, efficient relevance modeling. Integrating Kahneman–Tversky optimization with neural ranking techniques, our approach establishes a cognitively aligned, resource-efficient, and end-to-end self-sustaining learning system. Extensive offline evaluations and online A/B tests demonstrate significant improvements in search relevance, reduced reliance on manual annotation, and breakthrough performance in industrial deployment—achieving high accuracy, low latency, and strong robustness in ranking.
📝 Abstract
Relevance modeling in e-commerce search remains challenged by semantic gaps in term-matching methods (e.g., BM25) and neural models' reliance on the scarcity of domain-specific hard samples. We propose ADORE, a self-sustaining framework that synergizes three innovations: (1) A Rule-aware Relevance Discrimination module, where a Chain-of-Thought LLM generates intent-aligned training data, refined via Kahneman-Tversky Optimization (KTO) to align with user behavior; (2) An Error-type-aware Data Synthesis module that auto-generates adversarial examples to harden robustness; and (3) A Key-attribute-enhanced Knowledge Distillation module that injects domain-specific attribute hierarchies into a deployable student model. ADORE automates annotation, adversarial generation, and distillation, overcoming data scarcity while enhancing reasoning. Large-scale experiments and online A/B testing verify the effectiveness of ADORE. The framework establishes a new paradigm for resource-efficient, cognitively aligned relevance modeling in industrial applications.