Text-attributed Graph Condensation via Text Selection and Attribute Matching

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the high computational and memory costs incurred when jointly training graph neural networks and language models on Text-Attributed Graphs (TAGs), particularly at scale. To mitigate this challenge, the authors propose TAGSAM, a novel approach that integrates a mutual information–maximization–based subgraph text selection mechanism with a topology compression strategy grounded in stable similarity matrix alignment. This dual design significantly reduces graph size while preserving high training accuracy. Experimental results demonstrate that, under identical compression ratios, TAGSAM achieves an average accuracy improvement of 4.9% over the strongest baseline. Notably, even when compressing the original graph to just 1% of its initial size, TAGSAM maintains competitive performance.

📝 Abstract

Text-Attributed Graph (TAG) is an important type of graph structured data, where each node has a text description. TAG models usually train a Graph Neural Network (GNN) and language model jointly, which leads to high space and time consumption, especially on large datasets. To mitigate this, we propose TAGSAM, a condensation method that compresses TAGs while preserving training accuracy. TAGSAM comes with two key designs, i.e., subgraph text Selection and Attribute similarity Matching, which compress the text description and graph topology of TAG, respectively. For the texts, subgraph text selection selects and merges representative text chunks from multiple related text descriptions by maximizing mutual information. For the graph topology, popular condensation methods based on Matching Training Trajectories (MTT) suffer from high variance, which hinders accuracy. Our attribute similarity matching mitigates this issue by aligning stable similarity matrices. We evaluate TAGSAM against six state-of-the-art baselines, where it showcases superior performance. For the same compressed size, TAGSAM improves upon the best-performing baseline by an average of 4.9% in accuracy. Furthermore, it maintains competitive training accuracy even when the TAG is condensed to just 1% size. Our code is available at https://github.com/SundayVHan/TAGSAM

Problem

Research questions and friction points this paper is trying to address.

Text-Attributed Graph

Graph Condensation

Training Efficiency

Graph Neural Network

Language Model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-Attributed Graph

Graph Condensation

Subgraph Text Selection