Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph self-supervised learning methods overly rely on structural perturbations, which often degrade graph integrity; meanwhile, VQ-VAEs applied to graph data suffer from two key limitations: uneven codebook utilization and spatial sparsity. To address these issues, we propose the Hierarchical Vector Quantized Graph Autoencoder (HVQ-GAE). Our method introduces an annealing-based codebook selection mechanism to mitigate codebook utilization bias and designs a dual-level hierarchical codebook—where the bottom level captures node semantic similarity and the top level encodes topological structural dependencies—enabling joint optimization of graph representations. This work is the first to integrate annealing-driven vector quantization and hierarchical codebooks into graph self-supervised learning. Extensive experiments demonstrate that HVQ-GAE significantly outperforms 16 state-of-the-art baselines across multiple benchmark datasets, achieving new state-of-the-art performance on both link prediction and node classification tasks.

Technology Category

Application Category

📝 Abstract
Graph self-supervised learning has gained significant attention recently. However, many existing approaches heavily depend on perturbations, and inappropriate perturbations may corrupt the graph's inherent information. The Vector Quantized Variational Autoencoder (VQ-VAE) is a powerful autoencoder extensively used in fields such as computer vision; however, its application to graph data remains underexplored. In this paper, we provide an empirical analysis of vector quantization in the context of graph autoencoders, demonstrating its significant enhancement of the model's capacity to capture graph topology. Furthermore, we identify two key challenges associated with vector quantization when applying in graph data: codebook underutilization and codebook space sparsity. For the first challenge, we propose an annealing-based encoding strategy that promotes broad code utilization in the early stages of training, gradually shifting focus toward the most effective codes as training progresses. For the second challenge, we introduce a hierarchical two-layer codebook that captures relationships between embeddings through clustering. The second layer codebook links similar codes, encouraging the model to learn closer embeddings for nodes with similar features and structural topology in the graph. Our proposed model outperforms 16 representative baseline methods in self-supervised link prediction and node classification tasks across multiple datasets.
Problem

Research questions and friction points this paper is trying to address.

Enhancing graph topology capture via vector quantization
Addressing codebook underutilization in graph data
Mitigating codebook space sparsity hierarchically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Annealing-based encoding strategy for code utilization
Hierarchical two-layer codebook for embedding relationships
Vector quantization enhances graph topology capture
🔎 Similar Papers
No similar papers found.
L
Long Zeng
East China Normal University, Shanghai, China
Jianxiang Yu
Jianxiang Yu
East China Normal University
Data miningLarge language models
J
Jiapeng Zhu
East China Normal University, Shanghai, China
Qingsong Zhong
Qingsong Zhong
Unknown affiliation
AI4S
X
Xiang Li
East China Normal University, Shanghai, China