🤖 AI Summary
This work investigates whether adaptive patching genuinely outperforms carefully tuned uniform patching in time series forecasting and under what conditions. By formulating patching as a bandwidth-constrained bitrate allocation problem, the authors derive an explicit threshold that dynamic patching must satisfy and demonstrate that local complexity alone does not guarantee the optimality of non-uniform patching. Under representation-aware optimal training, alignment gains collapse near the uniform patching baseline, suggesting that tuned uniform patching should serve as the proper benchmark for evaluating adaptive methods. Leveraging theoretical analysis based on a quadratic surrogate model and strong convexity assumptions, along with controlled experiments across three architectural variants—holding backbone, data, and training protocol fixed—the study finds that tuned uniform patching matches or exceeds adaptive approaches overall on standard long-term forecasting benchmarks, with significant gains from adaptive methods appearing only in specific method–dataset combinations.
📝 Abstract
Adaptive patching is a recent and compelling proposal for time-series Transformers: allocate finer patches where the sequence looks locally informative. This paper asks under what conditions a content-adaptive patching operator should outperform a tuned uniform one. Local heterogeneity alone is not enough: under pointwise forecasting losses, a complex-looking region is not automatically one where finer patching reduces the loss. We model patching as a budgeted bitrate allocation and derive an explicit threshold that a dynamic patching rule must satisfy to beat a well-tuned uniform baseline, then bound the achievable improvement both locally (a quadratic surrogate) and globally (a strong-convexity bound under the model's assumptions). Two structural results follow: without a coupling constraint, scalar local complexity cannot produce a non-uniform optimum under a common loss landscape; and once the backbone is trained to its representation-aware optimum, the alignment gain collapses around a well-tuned uniform patch size. To test these predictions, we run a controlled isolation study on three representative architectures, replacing each adaptive mechanism with a uniform patch-size sweep while keeping the backbone, data, and training protocol fixed. On standard long-horizon forecasting benchmarks, the validation-selected uniform baseline is competitive with the dynamic counterpart, with per-setting effects concentrated near zero and no consistent directional advantage once results are aggregated by dataset. The larger gains we do observe are method- and dataset-specific. Adaptive patching should therefore be evaluated against a tuned uniform baseline; its value depends on whether a cheap and reliable routing signal can identify where finer patches actually reduce forecasting loss.