An Empirical Evaluation of Serverless Cloud Infrastructure for Large-Scale Data Processing

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses performance instability and opaque cost structures in serverless cloud systems for large-scale data processing. We propose Skyrise, an evaluation platform that integrates micro-benchmarks with end-to-end workloads (e.g., Join, Aggregation) to quantitatively characterize performance variability boundaries of AWS serverless networking and storage—marking the first such analysis. It further establishes a compute-storage cost breakeven model. Key contributions include: (1) systematic identification of network/I/O performance degradation patterns in Lambda under high concurrency; (2) precise delineation of applicability boundaries—serverless outperforms VM-based solutions for medium-to-low-concurrency, bursty workloads; and (3) a reusable, cost-performance co-optimization decision framework. Empirical results validate the feasibility and economic viability of serverless architectures for specific data-intensive scenarios.

Technology Category

Application Category

📝 Abstract
Data processing systems are increasingly deployed in the cloud. While monolithic systems run fully on virtual servers, recent systems embrace cloud infrastructure and utilize the disaggregation of compute and storage to scale them independently. The introduction of serverless compute services, such as AWS Lambda, enables finer-grained and elastic scalability within these systems. Prior work shows the viability of serverless infrastructure for scalable data processing yet also sees limitations due to variable performance and cost overhead, in particular for networking and storage. In this paper, we perform a detailed analysis of the performance and cost characteristics of serverless infrastructure in the data processing context. We base our analysis on a large series of micro-benchmarks across different compute and storage services, as well as end-to-end workloads. To enable our analysis, we propose the Skyrise serverless evaluation platform. For the widely used serverless infrastructure of AWS, our analysis reveals distinct boundaries for performance variability in serverless networks and storage. We further present cost break-even points for serverless compute and storage. These insights provide guidance on when and how serverless infrastructure can be efficiently used for data processing.
Problem

Research questions and friction points this paper is trying to address.

Serverless Computing
Data Processing
Cost-Performance Analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Serverless Systems
Cost Optimization
Data Processing
🔎 Similar Papers
No similar papers found.
T
Thomas Bodner
HPI, U Potsdam
T
Theo Radig
HPI, U Potsdam
D
David Justen
BIFOLD, TU Berlin
Daniel Ritter
Daniel Ritter
SAP
Cloud Data SystemsDatabase SystemsModern HardwareDistributed SystemsFormal Methods
T
T. Rabl
HPI, U Potsdam