One-for-All: A Lightweight Stabilized and Parameter-Efficient Pre-trained LLM for Time Series Forecasting

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the prohibitive computational and memory costs of deploying large language models (LLMs) for multivariate time series forecasting by proposing rank-stable Low-Rank Adaptation with Gaussian rank stabilization (rsLoRA). The approach freezes the LLM backbone and introduces only rank-16 trainable low-rank modules in the positional encoding and output layers, combined with parameter-efficient fine-tuning (PEFT) strategies. Notably, rsLoRA is the first method to achieve provable gradient stability under low-rank constraints, substantially enhancing parameter efficiency. Experimental results demonstrate that the model attains state-of-the-art accuracy–efficiency trade-offs across six time series benchmarks, reducing trainable parameters by 6.8–21× and achieving a minimal memory footprint of just 2.2 MiB, thereby enabling efficient deployment on edge devices.

Technology Category

Application Category

📝 Abstract

We address the challenge of adapting pre-trained Large Language Models (LLMs) for multivariate time-series analysis, where their deployment is often hindered by prohibitive computational and memory demands. Our solution, One-for-All, introduces Gaussian Rank-Stabilized Low-Rank Adapters (rsLoRA) to enable parameter-efficient fine-tuning of frozen LLMs. While inspired by LoRA, rsLoRA introduces a mathematically grounded rank-stabilization mechanism that enables provable gradient stability at low ranks a novel contribution absent in prior PEFT methods. Our framework injects trainable rank decomposition matrices (rank 16) into positional embeddings and output layers, while keeping self-attention weights fixed. This design reduces trainable parameters by 6.8$\times$ (vs. TimesNet), 21$\times$ (vs. GPT4TS), and 11.8$\times$ (vs. TIME-LLM), while achieving a 168-1,776$\times$ smaller memory footprint (2.2MiB vs. 340MiB-4.18GiB in SOTA models). Rigorous evaluation across six time-series tasks demonstrates that One-for-All achieves state-of-the-art efficiency-accuracy trade-offs: 5.5$\times$ higher parameter efficiency (MSE=5.50) than TimesNet and 21$\times$ better than GPT4TS, while matching their forecasting accuracy (MSE=0.33). The framework's stability is validated through consistent performance across diverse horizons (96-720 steps) and datasets (ETT, Weather, M3, M4), with 98.3% fewer parameters than conventional transformers. These advances enable deployment on edge devices for healthcare, finance, and environmental monitoring without compromising performance.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Time Series Forecasting

Parameter Efficiency

Computational Cost

Memory Footprint

Innovation

Methods, ideas, or system contributions that make the work stand out.

rsLoRA

parameter-efficient fine-tuning

time series forecasting