๐ค AI Summary
To address the challenge of classifying remote sensing multimodal data (hyperspectral, LiDAR, and textual) under extreme label scarcity (<5%), this paper proposes the first world model framework tailored for remote sensing, enabling cross-modal semantic alignment and unified representation learning with minimal supervision. Methodologically, it introduces: (1) LaMGโa latent-space diffusion-based multimodal fusion paradigm that captures implicit inter-modal dependencies; (2) OK-CPโan open-knowledge-guided consistency projection mechanism to mitigate domain shift; and (3) MuCOโa multi-task collaborative optimization strategy jointly enhancing representation robustness and discriminability. Evaluated on four standard remote sensing benchmarks, the method achieves significant improvements over state-of-the-art approaches under ultra-low labeling budgets, demonstrating superior accuracy, generalization capability, and few-shot adaptability.
๐ Abstract
World models significantly enhance hierarchical understanding, improving data integration and learning efficiency. To explore the potential of the world model in the remote sensing (RS) field, this paper proposes a label-efficient remote sensing world model for multimodal data fusion (FusDreamer). The FusDreamer uses the world model as a unified representation container to abstract common and high-level knowledge, promoting interactions across different types of data, emph{i.e.}, hyperspectral (HSI), light detection and ranging (LiDAR), and text data. Initially, a new latent diffusion fusion and multimodal generation paradigm (LaMG) is utilized for its exceptional information integration and detail retention capabilities. Subsequently, an open-world knowledge-guided consistency projection (OK-CP) module incorporates prompt representations for visually described objects and aligns language-visual features through contrastive learning. In this way, the domain gap can be bridged by fine-tuning the pre-trained world models with limited samples. Finally, an end-to-end multitask combinatorial optimization (MuCO) strategy can capture slight feature bias and constrain the diffusion process in a collaboratively learnable direction. Experiments conducted on four typical datasets indicate the effectiveness and advantages of the proposed FusDreamer. The corresponding code will be released at https://github.com/Cimy-wang/FusDreamer.