Meteorology-Driven GPT4AP: A Multi-Task Forecasting LLM for Atmospheric Air Pollution in Data-Scarce Settings

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor generalization of atmospheric pollution prediction models in regions with sparse observational data by proposing GPT4AP, a parameter-efficient multitask forecasting framework built upon pretrained GPT-2. The approach integrates meteorological driving information and introduces Gaussian rank-stabilized low-rank adaptation (rsLoRA), freezing the backbone network while fine-tuning only lightweight positional and output modules, thereby drastically reducing trainable parameters. Experimental results across six real-world air quality datasets demonstrate that GPT4AP consistently outperforms baseline models—including DLinear and ETSformer—in few-shot and zero-shot transfer settings as well as long-term forecasting tasks, highlighting its superior data efficiency and cross-domain generalization capability.
📝 Abstract
Accurate forecasting of air pollution is important for environmental monitoring and policy support, yet data-driven models often suffer from limited generalization in regions with sparse observations. This paper presents Meteorology-Driven GPT for Air Pollution (GPT4AP), a parameter-efficient multi-task forecasting framework based on a pre-trained GPT-2 backbone and Gaussian rank-stabilized low-rank adaptation (rsLoRA). The model freezes the self-attention and feed-forward layers and adapts lightweight positional and output modules, substantially reducing the number of trainable parameters. GPT4AP is evaluated on six real-world air quality monitoring datasets under few-shot, zero-shot, and long-term forecasting settings. In the few-shot regime using 10% of the training data, GPT4AP achieves an average MSE/MAE of 0.686/0.442, outperforming DLinear (0.728/0.530) and ETSformer (0.734/0.505). In zero-shot cross-station transfer, the proposed model attains an average MSE/MAE of 0.529/0.403, demonstrating improved generalization compared with existing baselines. In long-term forecasting with full training data, GPT4AP remains competitive, achieving an average MAE of 0.429, while specialized time-series models show slightly lower errors. These results indicate that GPT4AP provides a data-efficient forecasting approach that performs robustly under limited supervision and domain shift, while maintaining competitive accuracy in data-rich settings.
Problem

Research questions and friction points this paper is trying to address.

air pollution forecasting
data-scarce settings
model generalization
few-shot learning
zero-shot transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT4AP
rsLoRA
parameter-efficient adaptation
air pollution forecasting
data-scarce settings
P
Prasanjit Dey
ADAPT SFI Research Centre, School of Computer Science, Technological University Dublin, Ireland
Soumyabrata Dev
Soumyabrata Dev
University College Dublin
environmental informaticsremote sensingrenewablemachine learning
Bianca Schoen-Phelan
Bianca Schoen-Phelan
TU Dublin