FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenge of classifying remote sensing multimodal data (hyperspectral, LiDAR, and textual) under extreme label scarcity (<5%), this paper proposes the first world model framework tailored for remote sensing, enabling cross-modal semantic alignment and unified representation learning with minimal supervision. Methodologically, it introduces: (1) LaMG—a latent-space diffusion-based multimodal fusion paradigm that captures implicit inter-modal dependencies; (2) OK-CP—an open-knowledge-guided consistency projection mechanism to mitigate domain shift; and (3) MuCO—a multi-task collaborative optimization strategy jointly enhancing representation robustness and discriminability. Evaluated on four standard remote sensing benchmarks, the method achieves significant improvements over state-of-the-art approaches under ultra-low labeling budgets, demonstrating superior accuracy, generalization capability, and few-shot adaptability.

Technology Category

Application Category

📝 Abstract

World models significantly enhance hierarchical understanding, improving data integration and learning efficiency. To explore the potential of the world model in the remote sensing (RS) field, this paper proposes a label-efficient remote sensing world model for multimodal data fusion (FusDreamer). The FusDreamer uses the world model as a unified representation container to abstract common and high-level knowledge, promoting interactions across different types of data, emph{i.e.}, hyperspectral (HSI), light detection and ranging (LiDAR), and text data. Initially, a new latent diffusion fusion and multimodal generation paradigm (LaMG) is utilized for its exceptional information integration and detail retention capabilities. Subsequently, an open-world knowledge-guided consistency projection (OK-CP) module incorporates prompt representations for visually described objects and aligns language-visual features through contrastive learning. In this way, the domain gap can be bridged by fine-tuning the pre-trained world models with limited samples. Finally, an end-to-end multitask combinatorial optimization (MuCO) strategy can capture slight feature bias and constrain the diffusion process in a collaboratively learnable direction. Experiments conducted on four typical datasets indicate the effectiveness and advantages of the proposed FusDreamer. The corresponding code will be released at https://github.com/Cimy-wang/FusDreamer.

Problem

Research questions and friction points this paper is trying to address.

Develops a label-efficient model for remote sensing data classification.

Integrates multimodal data including HSI, LiDAR, and text for enhanced analysis.

Uses advanced fusion and generation techniques to improve data integration.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent diffusion fusion for multimodal data integration

Open-world knowledge-guided consistency projection module

End-to-end multitask combinatorial optimization strategy

🔎 Similar Papers

No similar papers found.

Authors to Follow