Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Cloud performance is influenced by multi-scale, time-varying factors, and existing decomposition methods struggle to effectively capture their intermittent behavior and complex periodic patterns. This work proposes both a hybrid (expert-informed) and a fully automated time series decomposition approach that, for the first time, stably extracts multi-scale trend and seasonal components from a single performance trace. These components are directly leveraged for Serverless function performance prediction and AWS resource scheduling. The proposed method significantly outperforms baseline approaches, achieving prediction MAPE as low as 1.8% (hybrid) and 2.1% (fully automated), reducing latency variability on AWS by over 60%, and decreasing peak latency by up to 10%, thereby offering high-precision decision support for cloud resource provisioning.

📝 Abstract

Cloud performance fluctuates due to factors such as resource contention and workload changes. These factors can be short-term, seasonal, or long-term. Their effects are often intertwined in performance traces, making performance management difficult. Prior work on cloud performance engineering used time-series decomposition to separate these factors. However, existing approaches rely on basic decomposition methods that may miss key variation patterns and fail on traces with complex or intermittent patterns, limiting their usefulness across diverse cloud deployments. To address this limitation, we propose two time-series decomposition techniques for cloud performance engineering: a hybrid/manual method and a fully automatic method. Through a case study of 11 serverless functions, we show that both approaches can successfully and consistently reveal trends and seasonal cycles, such as weekly and quarterly patterns, which are otherwise obscured. As an evaluation and application of the decomposition, we used the decomposed components to predict future performance, yielding mean absolute percentage error (MAPE) values of only 1.8\% (hybrid) and 2.1\% (automatic), significantly outperforming basic time-series methods and deep learning. We further show that decomposition insights can guide practical resource allocation. Using decomposition-informed scaling on AWS, we reduced latency variability by over 60\% and maximum latency by 10\%. Similar experiments on benchmarks on AWS confirmed that seasonal patterns and performance gains generalize beyond our case study. Notably, our findings demonstrate that even a single performance trace contains rich actionable information for guiding cloud management decisions.

Problem

Research questions and friction points this paper is trying to address.

cloud performance

time-series decomposition

performance fluctuation

resource contention

workload variation

Innovation

Methods, ideas, or system contributions that make the work stand out.

time-series decomposition

cloud performance engineering

serverless functions