Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-set classification methods for graph data treat out-of-distribution (OOD) samples monolithically as a single “rejection” class, lacking semantic granularity and interpretability. Method: We propose the first two-stage framework for semantic-level OOD classification on graphs: (1) coarse-grained OOD detection and interpretable semantic label generation via LLM-based prompt engineering; (2) joint GNN optimization for in-distribution (ID) node classification and OOD semantic classification, enhanced by multi-stage collaborative training and iterative label refinement. Crucially, our method operates without ground-truth OOD labels. Results: It achieves the first interpretable, fine-grained OOD semantic classification on graphs, improving OOD detection accuracy by 10% and attaining 70% accuracy on OOD semantic classification across mainstream graph benchmarks—substantially surpassing state-of-the-art. Our core contribution lies in breaking the conventional single-OOD-class assumption and establishing a unified modeling paradigm that jointly addresses ID recognition and OOD semantic understanding under graph-structured data.

Technology Category

Application Category

📝 Abstract
Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications, especially high-stake settings such as fraud detection and medical diagnosis, demanding deeper insights into OOD samples, including their probable labels. This raises a critical question: can OOD detection be extended to OOD classification without true label information? To address this question, we propose a Coarse-to-Fine open-set Classification (CFC) framework that leverages large language models (LLMs) for graph datasets. CFC consists of three key components: a coarse classifier that uses LLM prompts for OOD detection and outlier label generation, a GNN-based fine classifier trained with OOD samples identified by the coarse classifier for enhanced OOD detection and ID classification, and refined OOD classification achieved through LLM prompts and post-processed OOD labels. Unlike methods that rely on synthetic or auxiliary OOD samples, CFC employs semantic OOD instances that are genuinely out-of-distribution based on their inherent meaning, improving interpretability and practical utility. Experimental results show that CFC improves OOD detection by ten percent over state-of-the-art methods on graph and text domains and achieves up to seventy percent accuracy in OOD classification on graph datasets.
Problem

Research questions and friction points this paper is trying to address.

Classify in-distribution and detect out-of-distribution graph nodes
Extend OOD detection to classification without true label information
Leverage LLMs for interpretable OOD classification in graph datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse-to-fine framework using LLMs for OOD detection
GNN fine classifier trained with identified OOD samples
LLM prompts generate semantic OOD labels for classification
🔎 Similar Papers
No similar papers found.