A unified multimodal understanding and generation model for cross-disciplinary scientific research

📅 2026-01-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of integrating heterogeneous, high-dimensional data inherent in interdisciplinary scientific problems, where existing AI models are often confined to single modalities and struggle to unify understanding and generation across diverse scientific sources. To this end, we propose FuXi-Uni, the first general-purpose framework that natively unifies multimodal scientific data understanding and generation within a shared architecture. By aligning scientific tokens with natural language and employing a dedicated scientific decoder, FuXi-Uni constructs a shared latent space that preserves both cross-disciplinary generality and domain-specific performance. The framework achieves state-of-the-art results in Earth system modeling, including 10-day global weather forecasting, tropical cyclone track and intensity prediction, and super-resolution downscaling, while also outperforming leading multimodal large language models on biomedical visual question answering benchmarks.

Technology Category

Application Category

📝 Abstract
Scientific discovery increasingly relies on integrating heterogeneous, high-dimensional data across disciplines nowadays. While AI models have achieved notable success across various scientific domains, they typically remain domain-specific or lack the capability of simultaneously understanding and generating multimodal scientific data, particularly for high-dimensional data. Yet, many pressing global challenges and scientific problems are inherently cross-disciplinary and require coordinated progress across multiple fields. Here, we present FuXi-Uni, a native unified multimodal model for scientific understanding and high-fidelity generation across scientific domains within a single architecture. Specifically, FuXi-Uni aligns cross-disciplinary scientific tokens within natural language tokens and employs science decoder to reconstruct scientific tokens, thereby supporting both natural language conversation and scientific numerical prediction. Empirically, we validate FuXi-Uni in Earth science and Biomedicine. In Earth system modeling, the model supports global weather forecasting, tropical cyclone (TC) forecast editing, and spatial downscaling driven by only language instructions. FuXi-Uni generates 10-day global forecasts at 0.25{\deg} resolution that outperform the SOTA physical forecasting system. It shows superior performance for both TC track and intensity prediction relative to the SOTA physical model, and generates high-resolution regional weather fields that surpass standard interpolation baselines. Regarding biomedicine, FuXi-Uni outperforms leading multimodal large language models on multiple biomedical visual question answering benchmarks. By unifying heterogeneous scientific modalities within a native shared latent space while maintaining strong domain-specific performance, FuXi-Uni provides a step forward more general-purpose, multimodal scientific models.
Problem

Research questions and friction points this paper is trying to address.

multimodal scientific data
cross-disciplinary integration
high-dimensional data
scientific understanding and generation
heterogeneous data
Innovation

Methods, ideas, or system contributions that make the work stand out.

unified multimodal model
scientific understanding and generation
cross-disciplinary AI
high-dimensional data alignment
science decoder
🔎 Similar Papers
No similar papers found.
Xiaomeng Yang
Xiaomeng Yang
SAIS
Generative ModelsReinforcement LearningComputer Vision
Z
Zhiyu Tan
Shanghai Academy of Artificial Intelligence for Science, Shanghai, 200232, China.; Artificial Intelligence Innovation and Incubation Institute, Fudan University, Shanghai, 200433, China.
X
Xiaohui Zhong
Artificial Intelligence Innovation and Incubation Institute, Fudan University, Shanghai, 200433, China.; Shanghai Academy of Artificial Intelligence for Science, Shanghai, 200232, China.; FuXi Intelligent Computing Technology Co., Ltd., Shanghai, 200233, China.; Joint Laboratory for AI-Based Earth System Forecasting, Shanghai, China.
Mengping Yang
Mengping Yang
East China University of Science and Technology
Few-shot LearningGenerative Models
Qiusheng Huang
Qiusheng Huang
Shanghai AI Laboratory
CVDL
Lei Chen
Lei Chen
Fudan University
L
Libo Wu
School of Data Science, Fudan University, Shanghai, 200433, China.; Shanghai Academy of Artificial Intelligence for Science, Shanghai, 200232, China.; Shanghai Innovation Institute, Shanghai, 200231, China.; MOE Laboratory for National Development and Intelligent Governance, Fudan University, Shanghai, 200433, China.
Hao Li
Hao Li
FUDAN UNIVERSITY,DAMO@ALIBABA
Computer VisionDeep LearningAI4S