Conv-like Scale-Fusion Time Series Transformer: A Multi-Scale Representation for Variable-Length Long Time Series

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Time-series modeling faces challenges including variable-length sequence handling, high feature redundancy, and limited generalization capability. To address these, we propose ConvFormer—a convolution-like multi-scale fusion framework that jointly employs temporal patching and multi-head attention to progressively compress the time dimension while expanding channel capacity. It introduces cross-scale attention and logarithmic-space normalization to enhance multi-scale feature interaction and suppress redundant representations. The resulting hierarchical time-series representation achieves significant improvements over state-of-the-art Transformer- and CNN-based baselines on both forecasting and classification tasks, reducing feature redundancy by 12.6%–28.4% and improving average performance by 3.7%–9.2%. Our core contribution lies in the first unified integration of convolution-like structural inductive bias, cross-scale attention, and logarithmic normalization within a Transformer architecture—effectively balancing local pattern modeling with global dependency capture.

Technology Category

Application Category

📝 Abstract
Time series analysis faces significant challenges in handling variable-length data and achieving robust generalization. While Transformer-based models have advanced time series tasks, they often struggle with feature redundancy and limited generalization capabilities. Drawing inspiration from classical CNN architectures' pyramidal structure, we propose a Multi-Scale Representation Learning Framework based on a Conv-like ScaleFusion Transformer. Our approach introduces a temporal convolution-like structure that combines patching operations with multi-head attention, enabling progressive temporal dimension compression and feature channel expansion. We further develop a novel cross-scale attention mechanism for effective feature fusion across different temporal scales, along with a log-space normalization method for variable-length sequences. Extensive experiments demonstrate that our framework achieves superior feature independence, reduced redundancy, and better performance in forecasting and classification tasks compared to state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Handling variable-length time series data with robust generalization capabilities
Addressing feature redundancy and limited generalization in Transformer models
Developing multi-scale representation learning for long time series analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conv-like Transformer with pyramidal multi-scale representation
Cross-scale attention mechanism for temporal feature fusion
Log-space normalization method for variable-length sequences
🔎 Similar Papers
No similar papers found.
K
Kai Zhang
Zhejiang University, Hangzhou 310027, China
S
Siming Sun
Zhejiang University, Hangzhou 310027, China
Z
Zhengyu Fan
Zhejiang University, Hangzhou 310027, China
Qinmin Yang
Qinmin Yang
Zhejiang University
Intelligent ControlRenewable EnergySmart GridIndustrial Big DataReinforcement Learning
X
Xuejun Jiang
Zhejiang University, Hangzhou 310027, China