THAT: Token-wise High-frequency Augmentation Transformer for Hyperspectral Pansharpening

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address redundancy in token representations caused by global self-attention, insufficient multi-scale modeling, and difficulty preserving high-frequency details (e.g., edges and textures) in Transformer-based hyperspectral pansharpening, this paper proposes a synergistic framework integrating high-frequency enhancement and critical token selection. Methodologically, it introduces: (1) a critical token selection attention mechanism to suppress attention dispersion over redundant tokens; (2) a multi-level variance-aware feed-forward network that explicitly encodes spectral–spatial priors and strengthens high-frequency response; and (3) a token-level high-frequency enhancement strategy jointly optimizing spectral fidelity and spatial detail recovery. Evaluated on standard benchmarks, the method achieves state-of-the-art reconstruction quality with reduced computational overhead, significantly improving both spatial detail restoration and spectral consistency.

Technology Category

Application Category

📝 Abstract
Transformer-based methods have demonstrated strong potential in hyperspectral pansharpening by modeling long-range dependencies. However, their effectiveness is often limited by redundant token representations and a lack of multi-scale feature modeling. Hyperspectral images exhibit intrinsic spectral priors (e.g., abundance sparsity) and spatial priors (e.g., non-local similarity), which are critical for accurate reconstruction. From a spectral-spatial perspective, Vision Transformers (ViTs) face two major limitations: they struggle to preserve high-frequency components--such as material edges and texture transitions--and suffer from attention dispersion across redundant tokens. These issues stem from the global self-attention mechanism, which tends to dilute high-frequency signals and overlook localized details. To address these challenges, we propose the Token-wise High-frequency Augmentation Transformer (THAT), a novel framework designed to enhance hyperspectral pansharpening through improved high-frequency feature representation and token selection. Specifically, THAT introduces: (1) Pivotal Token Selective Attention (PTSA) to prioritize informative tokens and suppress redundancy; (2) a Multi-level Variance-aware Feed-forward Network (MVFN) to enhance high-frequency detail learning. Experiments on standard benchmarks show that THAT achieves state-of-the-art performance with improved reconstruction quality and efficiency. The source code is available at https://github.com/kailuo93/THAT.
Problem

Research questions and friction points this paper is trying to address.

Enhance high-frequency feature representation in hyperspectral pansharpening
Address token redundancy in Vision Transformers for spectral-spatial data
Improve multi-scale feature modeling and attention dispersion issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pivotal Token Selective Attention for redundancy suppression
Multi-level Variance-aware Feed-forward Network for detail learning
High-frequency augmentation for spectral-spatial feature enhancement
🔎 Similar Papers
No similar papers found.
H
Hongkun Jin
JPMorgan Chase, 8181 Communications Pkwy Building F, Plano, TX 75024, USA; formerly with Electrical Computer Engineering Department, University of Missouri-Kansas City, Kansas City, MO 64111, USA
Hongcheng Jiang
Hongcheng Jiang
University of Missouri-Kansas City
Computer VisionRemote SnsingDeep Learning
Zejun Zhang
Zejun Zhang
University of Southern California
Y
Yuan Zhang
Robinson Research Institute, University of Adelaide, Adelaide, SA 5000, AU
Jia Fu
Jia Fu
RISE Research Institutes of Sweden, KTH Royal Institute of Technology
Robust Artificial IntelligenceMultimodal Machine LearningApplied Computer Vision
T
Tingfeng Li
NEC Laboratories America, Princeton, NJ 08540, USA
K
Kai Luo
formerly with University of Virginia, Charlottesville, VA 22904, USA