🤖 AI Summary
To address the limited node classification performance on heterophilous graphs—where neighboring nodes exhibit substantial label disparity—this paper proposes the first two-stage framework that deeply integrates large language models (LLMs) into heterophilous graph modeling. First, an LLM discriminates the semantic types of edges from textual features and adaptively reweights them to enhance message propagation efficacy under heterophily. Second, knowledge distillation from the LLM guides the construction of a lightweight graph neural network (GNN), retaining over 95% of the original performance while drastically reducing inference overhead. This work pioneers the coupling of LLMs’ semantic understanding with structural graph modeling, enabling edge-level semantic-aware adaptive aggregation. Evaluated on multiple standard heterophilous graph benchmarks, the method achieves significant improvements in node classification accuracy, demonstrating the effectiveness, interpretability, and deployment practicality of LLM-augmented heterophilous graph learning.
📝 Abstract
Large language models (LLMs) have presented significant opportunities to enhance various machine learning applications, including graph neural networks (GNNs). By leveraging the vast open-world knowledge within LLMs, we can more effectively interpret and utilize textual data to better characterize heterophilic graphs, where neighboring nodes often have different labels. However, existing approaches for heterophilic graphs overlook the rich textual data associated with nodes, which could unlock deeper insights into their heterophilic contexts. In this work, we explore the potential of LLMs for modeling heterophilic graphs and propose a novel two-stage framework: LLM-enhanced edge discriminator and LLM-guided edge reweighting. In the first stage, we fine-tune the LLM to better identify homophilic and heterophilic edges based on the textual content of their nodes. In the second stage, we adaptively manage message propagation in GNNs for different edge types based on node features, structures, and heterophilic or homophilic characteristics. To cope with the computational demands when deploying LLMs in practical scenarios, we further explore model distillation techniques to fine-tune smaller, more efficient models that maintain competitive performance. Extensive experiments validate the effectiveness of our framework, demonstrating the feasibility of using LLMs to enhance node classification on heterophilic graphs.