Characterizing Software Aging in GPU-Based LLM Serving Systems

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the underexplored phenomenon of software aging in GPU-accelerated large language model (LLM) services, particularly concerning memory leakage under heterogeneous software-hardware stacks and dynamic workloads. It pioneers the extension of software aging research into GPU-based LLM serving by conducting 216-hour stress tests across six co-deployment configurations, simultaneously monitoring multi-dimensional metrics from the host, GPU, and client perspectives. Employing time-series statistical analysis with autocorrelation correction and multiple hypothesis testing, the work systematically characterizes aging patterns. Significant memory aging is consistently observed across all configurations, revealing that leakage rates are highly sensitive to runtime environments and deployment settings. These findings confirm the prevalence and quantifiability of the issue and establish a reproducible framework bridging software aging and LLM service research.
📝 Abstract
This paper proposes an empirical methodology to study software aging in GPU-based LLM serving systems. Traditional aging studies focus on CPU-centric software with relatively regular workloads; LLM serving is different, spanning a Python host and a CUDA device, handling requests whose cost varies by orders of magnitude, and relying on rapidly evolving software stacks. We run a 216-hour campaign across six co-located deployments under identical stress conditions, monitor host, device, and client metrics in parallel, and apply a statistical pipeline that accounts for autocorrelation and multiple testing. Our results reveal statistically significant memory aging in all deployments, with leak rates strongly dependent on the serving runtime and deployment configuration. Beyond these findings, we provide a reproducible framework that opens a research direction at the intersection of the software aging and rejuvenation and LLM serving communities.
Problem

Research questions and friction points this paper is trying to address.

software aging
GPU-based LLM serving
memory leak
heterogeneous systems
empirical study
Innovation

Methods, ideas, or system contributions that make the work stand out.

software aging
LLM serving
GPU memory leak
empirical methodology
statistical pipeline
🔎 Similar Papers
No similar papers found.