🤖 AI Summary
To address the disconnection between textual and structural information and the difficulty of multimodal fusion in Heterogeneous Text-Rich Networks (HTRNs), this paper proposes, for the first time, a pure pre-trained language model (PLM)-driven unified representation framework that eliminates reliance on graph neural networks. Methodologically, it introduces (1) a hierarchical prompting module that jointly encodes node- and edge-level heterogeneous structures within a unified textual space, and (2) two HTRN-specific pre-training objectives—structure-aware masked language modeling and cross-type edge prediction—to explicitly capture text–structure interactions and type heterogeneity. Evaluated on two real-world HTRN benchmarks, the framework achieves +6.08% accuracy in node classification and +10.84% F1-score in link prediction, significantly outperforming existing state-of-the-art methods.
📝 Abstract
Representation learning on heterogeneous text-rich networks (HTRNs), which consist of multiple types of nodes and edges with each node associated with textual information, is essential for various real-world applications. Given the success of pretrained language models (PLMs) in processing text data, recent efforts have focused on integrating PLMs into HTRN representation learning. These methods typically handle textual and structural information separately, using both PLMs and heterogeneous graph neural networks (HGNNs). However, this separation fails to capture the critical interactions between these two types of information within HTRNs. Additionally, it necessitates an extra alignment step, which is challenging due to the fundamental differences between distinct embedding spaces generated by PLMs and HGNNs. To deal with it, we propose HierPromptLM, a novel pure PLM-based framework that seamlessly models both text data and graph structures without the need for separate processing. Firstly, we develop a Hierarchical Prompt module that employs prompt learning to integrate text data and heterogeneous graph structures at both the node and edge levels, within a unified textual space. Building upon this foundation, we further introduce two innovative HTRN-tailored pretraining tasks to fine-tune PLMs for representation learning by emphasizing the inherent heterogeneity and interactions between textual and structural information within HTRNs. Extensive experiments on two real-world HTRN datasets demonstrate HierPromptLM outperforms state-of-the-art methods, achieving significant improvements of up to 6.08% for node classification and 10.84% for link prediction.