🤖 AI Summary
Identifying appropriate parent classes for novel concepts in small-scale existing taxonomies (<100 nodes) remains challenging due to sparse structural signals and lack of labeled training data.
Method: This paper proposes a label-free, large language model (LLM)-driven taxonomy expansion method. Its core innovation is *code-style prompting*: explicitly encoding hierarchical semantic structure via indentation, nesting, and functional abstraction—programming conventions that enable zero- or few-shot LLM comprehension and relational reasoning over taxonomy hierarchies. The approach integrates taxonomy-aware input representations with structured code prompts, eliminating reliance on large-scale annotated or self-supervised data construction.
Results: Evaluated on five cross-domain real-world benchmarks, the method achieves an average 12.6% absolute improvement in parent-class prediction accuracy over state-of-the-art methods. Gains are especially pronounced in extremely small-scale settings (30–80 nodes), demonstrating robustness where conventional supervised or embedding-based approaches falter.
📝 Abstract
Taxonomies play a crucial role in various applications by providing a structural representation of knowledge. The task of taxonomy expansion involves integrating emerging concepts into existing taxonomies by identifying appropriate parent concepts for these new query concepts. Previous approaches typically relied on self-supervised methods that generate annotation data from existing taxonomies. However, these methods are less effective when the existing taxonomy is small (fewer than 100 entities). In this work, we introduce extsc{CodeTaxo}, a novel approach that leverages large language models through code language prompts to capture the taxonomic structure. Extensive experiments on five real-world benchmarks from different domains demonstrate that extsc{CodeTaxo} consistently achieves superior performance across all evaluation metrics, significantly outperforming previous state-of-the-art methods. The code and data are available at url{https://github.com/QingkaiZeng/CodeTaxo-Pub}.