๐ค AI Summary
This study addresses multi-label topic classification in the absence of labeled data by proposing a zero-shot framework that integrates keyword augmentation, self-consistency decoding, and document-level knowledge graphs constructed from subject-predicate-object triples. It presents the first systematic evaluation of knowledge graph enhancement across diverse large language models and cross-domain datasets. Experimental results demonstrate that the proposed adaptive keyword (AK) augmentation achieves the best performance; six out of fifteen evaluated large models surpass the sentence encoder baseline. While knowledge graphs provide measurable gains for smaller models, they yield no significant improvement for larger modelsโlikely because their pretraining already encodes sufficient relational knowledge. Furthermore, self-consistency decoding fails to enhance accuracy and incurs approximately fivefold computational overhead.
๐ Abstract
Multi-label topic classification without labeled training data is a challenging task, specially when documents contain complex relational information. We present a zero-shot multi-label topic classification framework and systematically investigate how per-article knowledge graph augmentation affects its performance. The base framework classifies topics in documents without labeled training data and has four variants: article-only classification, keyword-enhanced classification, and self-consistency decoding variants of both. Then, we augment each base variant with per article knowledge graph. This graph is extracted from the input document through a pipeline similar to KGGen based on subject-predicate-object triples. We test all eight methods, four base and four graph augmented on fifteen LLMs and eight multi-label datasets across different domains. For the base framework, keyword-enhanced classification (AK) is the best performing method, and six out of fifteen LLMs surpass the sentence-encoder baseline. Graph augmentation has positive and negative impacts on small and large models, respectively. This shows that larger models already contain enough relational information from pretraining. Furthermore, the self-consistency decoding variant does not show performance improvements in any experiment while increasing computation costs about fivefold.