🤖 AI Summary
To address the limitation of convolutional neural networks (CNNs) in modeling global contextual information—thereby constraining semantic segmentation performance in hyperspectral image (HSI) classification—this paper proposes AMBER, a novel architecture. AMBER integrates 3D convolutions into the SegFormer decoder for adaptive spectral-spatial feature fusion and introduces a lightweight spectral attention mechanism to enhance inter-band correlation modeling. The model adopts a hybrid design combining a hierarchical Transformer encoder, a progressive multi-scale decoder, and a spectral-aware module, specifically optimized for the intrinsic 3D structure of HSIs. Experimental results demonstrate consistent improvements: on the Indian Pines and Pavia University datasets, AMBER achieves absolute gains of 2.3–3.7% in overall accuracy (OA) and 3.1–4.5% in Kappa coefficient over prior methods; on the PRISMA dataset, it attains state-of-the-art performance with a mean accuracy of 98.2%.
📝 Abstract
Deep learning has revolutionized the field of hyperspectral image (HSI) analysis, enabling the extraction of complex and hierarchical features. While convolutional neural networks (CNNs) have been the backbone of HSI classification, their limitations in capturing global contextual features have led to the exploration of Vision Transformers (ViTs). This paper introduces AMBER, an advanced SegFormer specifically designed for multi-band image segmentation. AMBER enhances the original SegFormer by incorporating three-dimensional convolutions to handle hyperspectral data. Our experiments, conducted on the Indian Pines, Pavia University, and PRISMA datasets, show that AMBER outperforms traditional CNN-based methods in terms of Overall Accuracy, Kappa coefficient, and Average Accuracy on the first two datasets, and achieves state-of-the-art performance on the PRISMA dataset.