HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning

📅 2025-04-03
🏛️ IEEE transactions on multimedia
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision Transformers (ViTs) implicitly model visual content, leading to loss of regional context and spatial topology—contradicting the human visual principle of coordinated local grouping and global topological organization. To address this, we propose HyperGraph Vision Transformer (HGFormer), the first ViT variant that explicitly encodes topological structure via semantic-guided Center-Sampling K-Nearest Neighbors (CS-KNN) to construct dynamic hypergraphs. We further design Topology-Aware Hypergraph Attention (HGA), which jointly captures high-order local correlations and global topological relationships. By unifying hypergraph learning with the ViT architecture, HGFormer achieves state-of-the-art performance on ImageNet, COCO, and ADE20K. Ablation studies and visualizations confirm that explicit topological modeling substantially enhances representation capability and generalization. Our work establishes a new structural prior paradigm for vision transformers, bridging geometric inductive bias and deep learning.

Technology Category

Application Category

📝 Abstract
The computer vision community has witnessed an extensive exploration of vision transformers in the past two years. Drawing inspiration from traditional schemes, numerous works focus on introducing vision-specific inductive biases. However, the implicit modeling of permutation invariance and fully-connected interaction with individual tokens disrupts the regional context and spatial topology, further hindering higher-order modeling. This deviates from the principle of perceptual organization that emphasizes the local groups and overall topology of visual elements. Thus, we introduce the concept of hypergraph for perceptual exploration. Specifically, we propose a topology-aware vision transformer called HyperGraph Transformer (HGFormer). Firstly, we present a Center Sampling K-Nearest Neighbors (CS-KNN) algorithm for semantic guidance during hypergraph construction. Secondly, we present a topology-aware HyperGraph Attention (HGA) mechanism that integrates hypergraph topology as perceptual indications to guide the aggregation of global and unbiased information during hypergraph messaging. Using HGFormer as visual backbone, we develop an effective and unitive representation, achieving distinct and detailed scene depictions. Empirical experiments show that the proposed HGFormer achieves competitive performance compared to the recent SoTA counterparts on various visual benchmarks. Extensive ablation and visualization studies provide comprehensive explanations of our ideas and contributions.
Problem

Research questions and friction points this paper is trying to address.

Modeling regional context and spatial topology in vision transformers
Addressing disruption of local groups and visual element topology
Enhancing higher-order modeling with hypergraph learning in transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

HyperGraph Transformer for topology-aware vision
CS-KNN algorithm for semantic hypergraph guidance
HyperGraph Attention integrates topology for messaging
🔎 Similar Papers
No similar papers found.
H
Hao Wang
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
S
Shuo Zhang
Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing 100044, China
Biao Leng
Biao Leng
Associate Professor, Beihang University
Big Data AnalysisMultimedia Information ProcessingIntelligent Transportation SystemsComplex Network