LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

πŸ“… 2026-06-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

170K/year
πŸ€– AI Summary
This work addresses the challenge of achieving a balance between global context and local detail in semantic segmentation of remote sensing imagery under constrained computational resources. The authors propose LALE, a lightweight architecture that introduces, for the first time in remote sensing segmentation, a resolution-forking strategy: a high-resolution branch employs ConvMixer to capture fine-grained local features, while a low-resolution branch leverages a Transformer to model long-range contextual dependencies. A fully MLP-based multiscale decoder further enhances efficiency. The model integrates RMSNorm and StarReLU to boost performance. Evaluated on the ARAS400k benchmark, the smallest variant of LALE achieves only a 2.6-point F1-score deficit compared to the best baseline, yet reduces parameters by 4.5Γ—, storage by 7Γ—, computational cost by 17Γ—, and increases throughput by 1.8Γ—.
πŸ“ Abstract
Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.
Problem

Research questions and friction points this paper is trying to address.

semantic segmentation
remote sensing imagery
global context
local detail
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Transformer
Resolution-bifurcated Encoder
ConvMixer
Remote Sensing Segmentation
Efficient Semantic Segmentation
πŸ”Ž Similar Papers
No similar papers found.