A Chain-of-Thought Approach to Semantic Query Categorization in e-Commerce Taxonomies

📅 2026-01-01
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of accurately mapping user queries to leaf categories in e-commerce search by introducing, for the first time, the Chain-of-Thought (CoT) paradigm to hierarchical category classification. The proposed method integrates tree search with semantic scoring from large language models (LLMs) in a lightweight framework that not only improves classification accuracy but also effectively narrows the candidate product scope, enhances multi-intent understanding, and reveals structural flaws in the category taxonomy. Experimental results demonstrate that the CoT-based approach significantly outperforms embedding-based baselines on both human-annotated datasets and relevance evaluations, while scaling efficiently to handle millions of queries.

Technology Category

Application Category

📝 Abstract
Search in e-Commerce is powered at the core by a structured representation of the inventory, often formulated as a category taxonomy. An important capability in e-Commerce with hierarchical taxonomies is to select a set of relevant leaf categories that are semantically aligned with a given user query. In this scope, we address a fundamental problem of search query categorization in real-world e-Commerce taxonomies. A correct categorization of a query not only provides a way to zoom into the correct inventory space, but opens the door to multiple intent understanding capabilities for a query. A practical and accurate solution to this problem has many applications in e-commerce, including constraining retrieved items and improving the relevance of the search results. For this task, we explore a novel Chain-of-Thought (CoT) paradigm that combines simple tree-search with LLM semantic scoring. Assessing its classification performance on human-judged query-category pairs, relevance tests, and LLM-based reference methods, we find that the CoT approach performs better than a benchmark that uses embedding-based query category predictions. We show how the CoT approach can detect problems within a hierarchical taxonomy. Finally, we also propose LLM-based approaches for query-categorization of the same spirit, but which scale better at the range of millions of queries.
Problem

Research questions and friction points this paper is trying to address.

semantic query categorization
e-Commerce taxonomy
hierarchical classification
search intent understanding
leaf category prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought
Semantic Query Categorization
e-Commerce Taxonomy
Large Language Model
Hierarchical Classification
🔎 Similar Papers
No similar papers found.