Dual Selective Fusion Transformer Network for Hyperspectral Image Classification

📅 2024-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of fixed receptive fields in multiscale land-cover representation and redundant feature interference introduced by standard self-attention in hyperspectral image (HSI) classification, this paper proposes a Spatial-Spectral Dual-Path Selective Fusion Transformer. Its core contributions are: (1) a Kernel Selective Fusion Transformer Block that adaptively learns optimal convolutional kernel sizes to dynamically adjust the receptive field; and (2) a Token Selective Fusion Transformer Block that jointly models spatial-spectral token importance for weighted fusion of discriminative features. The model integrates multiscale convolutional perception, learnable receptive field selection, and joint spatial-spectral self-attention. Experiments on PaviaU, Houston, Indian Pines, and WHU-HongHu datasets achieve overall accuracies of 96.59%, 97.66%, 95.17%, and 94.59%, respectively—averaging 2.01% higher than state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Transformer has achieved satisfactory results in the field of hyperspectral image (HSI) classification. However, existing Transformer models face two key challenges when dealing with HSI scenes characterized by diverse land cover types and rich spectral information: (1) A fixed receptive field overlooks the effective contextual scales required by various HSI objects; (2) invalid self-attention features in context fusion affect model performance. To address these limitations, we propose a novel Dual Selective Fusion Transformer Network (DSFormer) for HSI classification. DSFormer achieves joint spatial and spectral contextual modeling by flexibly selecting and fusing features across different receptive fields, effectively reducing unnecessary information interference by focusing on the most relevant spatial-spectral tokens. Specifically, we design a Kernel Selective Fusion Transformer Block (KSFTB) to learn an optimal receptive field by adaptively fusing spatial and spectral features across different scales, enhancing the model's ability to accurately identify diverse HSI objects. Additionally, we introduce a Token Selective Fusion Transformer Block (TSFTB), which strategically selects and combines essential tokens during the spatial-spectral self-attention fusion process to capture the most crucial contexts. Extensive experiments conducted on four benchmark HSI datasets demonstrate that the proposed DSFormer significantly improves land cover classification accuracy, outperforming existing state-of-the-art methods. Specifically, DSFormer achieves overall accuracies of 96.59%, 97.66%, 95.17%, and 94.59% in the Pavia University, Houston, Indian Pines, and Whu-HongHu datasets, respectively, reflecting improvements of 3.19%, 1.14%, 0.91%, and 2.80% over the previous model. The code will be available online at https://github.com/YichuXu/DSFormer.
Problem

Research questions and friction points this paper is trying to address.

Handles diverse land cover types in HSI
Reduces invalid self-attention feature interference
Improves spatial-spectral contextual modeling accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Selective Fusion Transformer Network
Kernel Selective Fusion Transformer Block
Token Selective Fusion Transformer Block
🔎 Similar Papers
No similar papers found.
Yichu Xu
Yichu Xu
Wuhan University
Remote SensingComputer VisionDeep LearningAI4EOHyperspectral
D
Di Wang
Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China, and also with the Hubei Luojia Laboratory, Wuhan 430079, China
Lefei Zhang
Lefei Zhang
School of Computer Science, Wuhan University
Pattern RecognitionMachine LearningImage ProcessingRemote Sensing
L
Liangpei Zhang
Aerospace Information Research Institute, Henan Academy of Sciences, Zhengzhou 450046, China