Knowledge Graph-Enhanced Zero-Shot Topic Classification: A Multi-Strategy Comparative Study

๐Ÿ“… 2026-05-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

137K/year
๐Ÿค– AI Summary
This study addresses multi-label topic classification in the absence of labeled data by proposing a zero-shot framework that integrates keyword augmentation, self-consistency decoding, and document-level knowledge graphs constructed from subject-predicate-object triples. It presents the first systematic evaluation of knowledge graph enhancement across diverse large language models and cross-domain datasets. Experimental results demonstrate that the proposed adaptive keyword (AK) augmentation achieves the best performance; six out of fifteen evaluated large models surpass the sentence encoder baseline. While knowledge graphs provide measurable gains for smaller models, they yield no significant improvement for larger modelsโ€”likely because their pretraining already encodes sufficient relational knowledge. Furthermore, self-consistency decoding fails to enhance accuracy and incurs approximately fivefold computational overhead.
๐Ÿ“ Abstract
Multi-label topic classification without labeled training data is a challenging task, specially when documents contain complex relational information. We present a zero-shot multi-label topic classification framework and systematically investigate how per-article knowledge graph augmentation affects its performance. The base framework classifies topics in documents without labeled training data and has four variants: article-only classification, keyword-enhanced classification, and self-consistency decoding variants of both. Then, we augment each base variant with per article knowledge graph. This graph is extracted from the input document through a pipeline similar to KGGen based on subject-predicate-object triples. We test all eight methods, four base and four graph augmented on fifteen LLMs and eight multi-label datasets across different domains. For the base framework, keyword-enhanced classification (AK) is the best performing method, and six out of fifteen LLMs surpass the sentence-encoder baseline. Graph augmentation has positive and negative impacts on small and large models, respectively. This shows that larger models already contain enough relational information from pretraining. Furthermore, the self-consistency decoding variant does not show performance improvements in any experiment while increasing computation costs about fivefold.
Problem

Research questions and friction points this paper is trying to address.

zero-shot
multi-label topic classification
knowledge graph
document classification
relational information
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot topic classification
knowledge graph augmentation
multi-label classification
large language models
self-consistency decoding
๐Ÿ”Ž Similar Papers
2024-04-02North American Chapter of the Association for Computational LinguisticsCitations: 2