Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?

📅 2026-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large pre-trained time series models often suffer from non-convex loss landscapes during fine-tuning, leading to overfitting or performance inferior to training from scratch. To mitigate this issue, the authors propose Smoothed Full Fine-tuning (SFF), a novel approach that introduces loss landscape smoothing into time series modeling for the first time. SFF incorporates a randomly initialized auxiliary model and performs linear interpolation between its parameters and those of the pre-trained model, thereby controllably smoothing the optimization landscape. This strategy preserves prior knowledge while enhancing trainability and optimization stability. Extensive experiments demonstrate that SFF consistently outperforms both standard fine-tuning and training from scratch across eight prominent large time series models and multiple downstream tasks, effectively avoiding convergence to sharp local minima.
📝 Abstract
Recently, large time series models (LTSMs) have gained increasing attention due to their similarities to large language models, including flexible context length, scalability, and task generality, outperforming advanced task-specific models. However, prior studies indicate that pre-trained LTSMs may exhibit a poorly conditioned non-convex loss landscape, leading to limited trainability. As a result, direct fine-tuning tends to cause overfitting and suboptimal performance, sometimes even worse than training from scratch, substantially diminishing the benefits of pre-training. To overcome this limitation, we propose Smoothed Full Fine-tuning (SFF), a novel fine-tuning technology. Specifically, we construct an auxiliary LTSM via random initialization to obtain a smoother loss landscape, and then linearly interpolate its weights with those of the pre-trained model to smooth the original landscape. This process improves trainability while preserving pre-trained knowledge, thereby enabling more effective downstream fine-tuning. From an optimization perspective, SFF perturbs sharp minima without significantly harming flat regions, facilitating escape from poor local basins toward smoother and more generalizable solutions. Extensive experiments on benchmark datasets demonstrate consistent improvements across eight representative LTSMs, including Timer, TimesFM, MOMENT, UniTS, MOIRAI, Chronos, TTMs, and Sundial, on diverse downstream tasks. The code is available at the link: https://github.com/Meteor-Stars/SFF.
Problem

Research questions and friction points this paper is trying to address.

non-convex loss landscape
large time series models
fine-tuning
overfitting
trainability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Smoothed Full Fine-tuning
non-convex loss landscape
large time series models
weight interpolation
flat minima