🤖 AI Summary
This work addresses the challenge of accurately mapping user queries to leaf categories in e-commerce search by introducing, for the first time, the Chain-of-Thought (CoT) paradigm to hierarchical category classification. The proposed method integrates tree search with semantic scoring from large language models (LLMs) in a lightweight framework that not only improves classification accuracy but also effectively narrows the candidate product scope, enhances multi-intent understanding, and reveals structural flaws in the category taxonomy. Experimental results demonstrate that the CoT-based approach significantly outperforms embedding-based baselines on both human-annotated datasets and relevance evaluations, while scaling efficiently to handle millions of queries.
📝 Abstract
Search in e-Commerce is powered at the core by a structured representation of the inventory, often formulated as a category taxonomy. An important capability in e-Commerce with hierarchical taxonomies is to select a set of relevant leaf categories that are semantically aligned with a given user query. In this scope, we address a fundamental problem of search query categorization in real-world e-Commerce taxonomies. A correct categorization of a query not only provides a way to zoom into the correct inventory space, but opens the door to multiple intent understanding capabilities for a query. A practical and accurate solution to this problem has many applications in e-commerce, including constraining retrieved items and improving the relevance of the search results. For this task, we explore a novel Chain-of-Thought (CoT) paradigm that combines simple tree-search with LLM semantic scoring. Assessing its classification performance on human-judged query-category pairs, relevance tests, and LLM-based reference methods, we find that the CoT approach performs better than a benchmark that uses embedding-based query category predictions. We show how the CoT approach can detect problems within a hierarchical taxonomy. Finally, we also propose LLM-based approaches for query-categorization of the same spirit, but which scale better at the range of millions of queries.