FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification

๐Ÿ“… 2025-03-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of classifying remote sensing multimodal data (hyperspectral, LiDAR, and textual) under extreme label scarcity (<5%), this paper proposes the first world model framework tailored for remote sensing, enabling cross-modal semantic alignment and unified representation learning with minimal supervision. Methodologically, it introduces: (1) LaMGโ€”a latent-space diffusion-based multimodal fusion paradigm that captures implicit inter-modal dependencies; (2) OK-CPโ€”an open-knowledge-guided consistency projection mechanism to mitigate domain shift; and (3) MuCOโ€”a multi-task collaborative optimization strategy jointly enhancing representation robustness and discriminability. Evaluated on four standard remote sensing benchmarks, the method achieves significant improvements over state-of-the-art approaches under ultra-low labeling budgets, demonstrating superior accuracy, generalization capability, and few-shot adaptability.

Technology Category

Application Category

๐Ÿ“ Abstract
World models significantly enhance hierarchical understanding, improving data integration and learning efficiency. To explore the potential of the world model in the remote sensing (RS) field, this paper proposes a label-efficient remote sensing world model for multimodal data fusion (FusDreamer). The FusDreamer uses the world model as a unified representation container to abstract common and high-level knowledge, promoting interactions across different types of data, emph{i.e.}, hyperspectral (HSI), light detection and ranging (LiDAR), and text data. Initially, a new latent diffusion fusion and multimodal generation paradigm (LaMG) is utilized for its exceptional information integration and detail retention capabilities. Subsequently, an open-world knowledge-guided consistency projection (OK-CP) module incorporates prompt representations for visually described objects and aligns language-visual features through contrastive learning. In this way, the domain gap can be bridged by fine-tuning the pre-trained world models with limited samples. Finally, an end-to-end multitask combinatorial optimization (MuCO) strategy can capture slight feature bias and constrain the diffusion process in a collaboratively learnable direction. Experiments conducted on four typical datasets indicate the effectiveness and advantages of the proposed FusDreamer. The corresponding code will be released at https://github.com/Cimy-wang/FusDreamer.
Problem

Research questions and friction points this paper is trying to address.

Develops a label-efficient model for remote sensing data classification.
Integrates multimodal data including HSI, LiDAR, and text for enhanced analysis.
Uses advanced fusion and generation techniques to improve data integration.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent diffusion fusion for multimodal data integration
Open-world knowledge-guided consistency projection module
End-to-end multitask combinatorial optimization strategy
๐Ÿ”Ž Similar Papers
No similar papers found.
Jinping Wang
Jinping Wang
University of Florida
Persuasive CommunicationHuman-Computer InteractionMedia Psychology
Weiwei Song
Weiwei Song
Pengcheng Laboratory, https://github.com/weiweisong415
Deep learningremote sensing
H
Hao Chen
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, CB3 0WA, U.K.
Jinchang Ren
Jinchang Ren
Professor of Computing Science, Robert Gordon University
hyperspectral imagingdata engineeringnondestructive evaluationprecision agriculturevisual computing
H
Huimin Zhao
School of Computer Sciences, Guangdong Polytechnic Normal University, Guangzhou, 510665, China, and also with the Guangdong Provincial Key Laboratory of Intellectual Property and Big Data, Guangdong Polytechnic Normal University, Guangzhou, 510665, China