🤖 AI Summary
Existing observability frameworks struggle to support effective root cause analysis by large language model (LLM) agents due to data silos, heterogeneous schemas, and a lack of semantic metadata. To address this, this work proposes UModel—a unified ontology-based framework designed specifically for LLM agents—introducing an object-centric observability paradigm. UModel constructs a virtual ontology layer that unifies heterogeneous telemetry data, system entities, and expert knowledge into a semantic graph, and provides a pipelined query interface, U-SPL, enabling agents to autonomously explore and correlate multimodal information. Experimental results show that UModel improves root cause localization accuracy by 8% on the AIOps 2025 Challenge dataset. It has been stably deployed in Alibaba Cloud for over a year, serving tens of thousands of users with support for millions of operations per second and sub-second query latency.
📝 Abstract
When networked system failures occur, automatically performing Root Cause Analysis (RCA) using observability data is critical for ensuring networked system reliability. Recently, LLM-based agents have shown promise for automating this diagnosis process through advanced reasoning and autonomous exploration. However, existing observability frameworks remain archaic, characterized by fragmented data silos, incompatible schemas, and insufficient semantic metadata, preventing agents from establishing the complex relationships required for effective RCA. To address these challenges, we present UModel, a unified ontological framework that shifts observability from data-centric to object-centric modeling. UModel constructs a virtual ontological layer where heterogeneous telemetry, entities, and expert knowledge are standardized as objects and interconnected via semantic graphs. In addition, we introduce U-SPL, a pipeline-based query interface that enables agents to autonomously explore system topologies and correlate multimodal data. By re-modeling the "AIOps 2025 Challenge" dataset using UModel, the precision of root cause localization improved by 8%, demonstrating that enhanced data organization can significantly increase the accuracy of downstream tasks. UModel provides a scalable modeling framework that, in its deployment at Alibaba Cloud for more than one year, has served tens of thousands of users, sustained millions of operations per second, and delivered sub-second query latency.