BEATS: Bootstrapping E-commerce Attribute Taxonomies for Search through Iterative Human-AI Collaboration

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the widespread lack of fine-grained product attribute schemas in e-commerce platforms operating in emerging markets, which hinders multidimensional filtering, query understanding, and semantic retrieval. To overcome this limitation, the authors propose a human-in-the-loop iterative framework that leverages a large language model–driven, multi-stage generation pipeline, enhanced by prompt engineering and structured annotation, to construct a high-quality product attribute schema from scratch. The framework incorporates a dual-phase validation mechanism—combining automated quality checks with verification by local domain experts—to enable continuous schema evolution. Deployed on Rakuten Taiwan, the approach spans nine top-level categories and 2,694 subcategories, generating 67,277 distinct attributes and annotating over 5.4 million products, thereby significantly improving retrieval performance.

📝 Abstract

E-commerce platforms in emerging markets often operate with underdeveloped product catalogs that contain only category taxonomies but lack structured attribute schemas. This absence of fine-grained product attributes limits search capabilities -- preventing faceted filtering, degrading query understanding, and weakening semantic representations used by search systems. We present BEATS, a human-in-the-loop LLM framework for bootstrapping product attribute taxonomies entirely from scratch. Our approach extends a multi-stage LLM generation pipeline with two critical production stages: (1) proactive quality checking by model developers to filter erroneous outputs, and (2) human annotation by domain-expert local staff to validate generated attributes. The framework operates iteratively -- prompts at each generation stage are refined based on quality check observations and annotator feedback across successive rounds, progressively improving attribute quality. Once the attribute taxonomy is established, we employ LLMs to perform structured attribute tagging on individual product items, enriching their contextual representations. The enriched catalog directly benefits multiple components of the search system: enabling granular attribute-based filtering, providing structured features for ranking models, and improving semantic representations for dense retrieval. We validate the generated taxonomy by training dense retrieval models on attribute-enriched product data, demonstrating consistent improvements over baselines using original catalog information. Our system has been deployed at Rakuten Taiwan, enriching 9 major categories spanning 2,694 sub-categories with 67,277 generated attributes, and over 5.4 million products have been tagged with the generated attributes, with plans to enrich the entire product catalog.

Problem

Research questions and friction points this paper is trying to address.

e-commerce

attribute taxonomy

structured attributes

emerging markets

Innovation

Methods, ideas, or system contributions that make the work stand out.

human-in-the-loop

LLM bootstrapping

attribute taxonomy

iterative refinement