ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-image super-resolution (SR) methods for downsampled Earth System Model (ESM) data suffer from spectral bias, limiting faithful recovery of high-frequency climate features. Method: We propose the first end-to-end SR framework that synergistically integrates the global contextual modeling capability of Vision Transformers (ViTs) with the high-frequency explicit coordinate mapping of SIREN networks—overcoming deep learning’s representational bottleneck for complex, fine-scale climate field structures. Contribution/Results: Evaluated on three ESM datasets, our method achieves average PSNR gains of 4.1–7.5 dB over state-of-the-art baselines, consistently attaining superior performance in MSE, PSNR, and SSIM. It significantly outperforms ViT-based, SIREN-based, and SR-GAN architectures, establishing a new paradigm for high-fidelity climate field reconstruction.

Technology Category

Application Category

📝 Abstract
Purpose: Earth system models (ESMs) integrate the interactions of the atmosphere, ocean, land, ice, and biosphere to estimate the state of regional and global climate under a wide variety of conditions. The ESMs are highly complex, and thus, deep neural network architectures are used to model the complexity and store the down-sampled data. In this paper, we propose the Vision Transformer Sinusoidal Representation Networks (ViSIR) to improve the single image SR (SR) reconstruction task for the ESM data. Methods: ViSIR combines the SR capability of Vision Transformers (ViT) with the high-frequency detail preservation of the Sinusoidal Representation Network (SIREN) to address the spectral bias observed in SR tasks. Results: The ViSIR outperforms ViT by 4.1 dB, SIREN by 7.5 dB, and SR-Generative Adversarial (SR-GANs) by 7.1dB PSNR on average for three different measurements. Conclusion: The proposed ViSIR is evaluated and compared with state-of-the-art methods. The results show that the proposed algorithm is outperforming other methods in terms of Mean Square Error(MSE), Peak-Signal-to-Noise-Ratio(PSNR), and Structural Similarity Index Measure(SSIM).
Problem

Research questions and friction points this paper is trying to address.

Improves single image reconstruction for Earth System Models
Combines Vision Transformers with Sinusoidal Representation Networks
Addresses spectral bias in super-resolution tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Vision Transformers with SIREN
Addresses spectral bias in SR tasks
Outperforms ViT, SIREN, and SR-GANs
🔎 Similar Papers
No similar papers found.