RuntimeSlicer: Towards Generalizable Unified Runtime State Representation for Failure Management

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the limited generalizability of existing fault management approaches, which rely on task-specific multimodal processing pipelines. The authors propose RuntimeSlicer, the first framework to achieve a unified, cross-modal, and task-agnostic runtime system state representation. By integrating unified runtime contrastive learning with temporal consistency modeling, RuntimeSlicer aligns and encodes metrics, traces, and logs into a shared system state embedding space. It further enables lightweight adaptation to diverse downstream tasks through unsupervised state segmentation and state-aware fine-tuning. Experimental evaluation on the AIOps 2022 dataset demonstrates that RuntimeSlicer achieves strong generalization capability and practical feasibility in both system state modeling and fault management tasks.

Technology Category

Application Category

📝 Abstract

Modern software systems operate at unprecedented scale and complexity, where effective failure management is critical yet increasingly challenging. Metrics, traces, and logs provide complementary views of system runtime behavior, but existing failure management approaches typically rely on task-oriented pipelines that tightly couple modality-specific preprocessing, representation learning, and downstream models, resulting in limited generalization across tasks and systems. To fill this gap, we propose RuntimeSlicer, a unified runtime state representation model towards generalizable failure management. RuntimeSlicer pre-trains a task-agnostic representation model that directly encodes metrics, traces, and logs into a single, aligned system-state embedding capturing the holistic runtime condition of the system. To train RuntimeSlicer, we introduce Unified Runtime Contrastive Learning, which integrates heterogeneous training data sources and optimizes complementary objectives for cross-modality alignment and temporal consistency. Building upon the learned system-state embeddings, we further propose State-Aware Task-Oriented Tuning, which performs unsupervised partitioning of runtime states and enables state-conditioned adaptation for downstream tasks. This design allows lightweight task-oriented models to be trained on top of the unified embedding without redesigning modality-specific encoders or preprocessing pipelines. Preliminary experiments on the AIOps 2022 dataset demonstrate the feasibility and effectiveness of RuntimeSlicer for system state modeling and failure management tasks.

Problem

Research questions and friction points this paper is trying to address.

failure management

runtime state representation

generalization

multimodal data

system observability

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified runtime representation

cross-modality alignment

contrastive learning