RuntimeSlicer: Towards Generalizable Unified Runtime State Representation for Failure Management

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalizability of existing fault management approaches, which rely on task-specific multimodal processing pipelines. The authors propose RuntimeSlicer, the first framework to achieve a unified, cross-modal, and task-agnostic runtime system state representation. By integrating unified runtime contrastive learning with temporal consistency modeling, RuntimeSlicer aligns and encodes metrics, traces, and logs into a shared system state embedding space. It further enables lightweight adaptation to diverse downstream tasks through unsupervised state segmentation and state-aware fine-tuning. Experimental evaluation on the AIOps 2022 dataset demonstrates that RuntimeSlicer achieves strong generalization capability and practical feasibility in both system state modeling and fault management tasks.

Technology Category

Application Category

📝 Abstract
Modern software systems operate at unprecedented scale and complexity, where effective failure management is critical yet increasingly challenging. Metrics, traces, and logs provide complementary views of system runtime behavior, but existing failure management approaches typically rely on task-oriented pipelines that tightly couple modality-specific preprocessing, representation learning, and downstream models, resulting in limited generalization across tasks and systems. To fill this gap, we propose RuntimeSlicer, a unified runtime state representation model towards generalizable failure management. RuntimeSlicer pre-trains a task-agnostic representation model that directly encodes metrics, traces, and logs into a single, aligned system-state embedding capturing the holistic runtime condition of the system. To train RuntimeSlicer, we introduce Unified Runtime Contrastive Learning, which integrates heterogeneous training data sources and optimizes complementary objectives for cross-modality alignment and temporal consistency. Building upon the learned system-state embeddings, we further propose State-Aware Task-Oriented Tuning, which performs unsupervised partitioning of runtime states and enables state-conditioned adaptation for downstream tasks. This design allows lightweight task-oriented models to be trained on top of the unified embedding without redesigning modality-specific encoders or preprocessing pipelines. Preliminary experiments on the AIOps 2022 dataset demonstrate the feasibility and effectiveness of RuntimeSlicer for system state modeling and failure management tasks.
Problem

Research questions and friction points this paper is trying to address.

failure management
runtime state representation
generalization
multimodal data
system observability
Innovation

Methods, ideas, or system contributions that make the work stand out.

unified runtime representation
cross-modality alignment
contrastive learning
state-aware tuning
failure management
Lingzhe Zhang
Lingzhe Zhang
Peking University
AIOpsReinforcement Fine-TuningLSM
Tong Jia
Tong Jia
Peking University
AIOpsAnomaly DetectionLog AnalysisAI for Medical Research
W
Weijie Hong
Peking University
M
Mingyu Wang
Peking University
Chiming Duan
Chiming Duan
Peking University
AI4SEAIOps
Minghua He
Minghua He
Peking University
Large Language ModelSoftware Reliability
R
Rongqian Wang
Huawei Technologies Co., Ltd.
X
Xi Peng
Huawei Technologies Co., Ltd.
M
Meiling Wang
Huawei Technologies Co., Ltd.
G
Gong Zhang
Huawei Technologies Co., Ltd.
Renhai Chen
Renhai Chen
Tianjin University
Y
Ying Li
Peking University