DECO: Life-Cycle Management of Enterprise-Grade Copilots

๐Ÿ“… 2024-12-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In enterprise on-call scenarios, engineers struggle to rapidly identify root causes due to fragmented, heterogeneous documentation (e.g., runbooks, incident reports, code repositories) and dispersed telemetry data. Method: We propose Copilot, a full-lifecycle management framework featuring (1) NL2SearchQueryโ€”a novel natural language-to-search-query generation mechanism; (2) skill-aware dynamic RAG for cross-source semantic retrieval; (3) a lightweight agentic architecture; and (4) a log structuring module that automatically converts unstructured troubleshooting logs into structured, actionable guidance. Contribution/Results: Deployed since September 2023, Copilot has supported tens of thousands of interactions across dozens of business units, with over 100 monthly active users. It significantly reduces mean time to respond (MTTR) and mean time to resolve (MTTR), demonstrating scalable, production-ready impact on incident management efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Software engineers frequently grapple with the challenge of accessing disparate documentation and telemetry data, including TroubleShooting Guides (TSGs), incident reports, code repositories, and various internal tools developed by multiple stakeholders. While on-call duties are inevitable, incident resolution becomes even more daunting due to the obscurity of legacy sources and the pressures of strict time constraints. To enhance the efficiency of on-call engineers (OCEs) and streamline their daily workflows, we introduced DECO-a comprehensive framework for developing, deploying, and managing enterprise-grade copilots tailored to improve productivity in engineering routines. This paper details the design and implementation of the DECO framework, emphasizing its innovative NL2SearchQuery functionality and a lightweight agentic framework. These features support efficient and customized retrieval-augmented-generation (RAG) algorithms that not only extract relevant information from diverse sources but also select the most pertinent skills in response to user queries. This enables the addressing of complex technical questions and provides seamless, automated access to internal resources. Additionally, DECO incorporates a robust mechanism for converting unstructured incident logs into user-friendly, structured guides, effectively bridging the documentation gap. Since its launch in September 2023, DECO has demonstrated its effectiveness through widespread adoption, enabling tens of thousands of interactions and engaging hundreds of monthly active users (MAU) across dozens of organizations within the company.
Problem

Research questions and friction points this paper is trying to address.

Enhances on-call engineers' efficiency by streamlining workflows.
Improves access to disparate documentation and telemetry data.
Converts unstructured incident logs into structured, user-friendly guides.
Innovation

Methods, ideas, or system contributions that make the work stand out.

NL2SearchQuery for efficient information retrieval
Lightweight agentic framework for skill selection
Conversion of unstructured logs into structured guides
๐Ÿ”Ž Similar Papers
No similar papers found.
Yiwen Zhu
Yiwen Zhu
Microsoft
Resource ManagementPublic TransportationTransit Assignment
M
Mathieu Demarne
Microsoft, USA
K
Kai Deng
Microsoft, USA
W
Wenjing Wang
Microsoft, USA
N
Nutan Sahoo
Microsoft, USA
H
Hannah Lerner
Microsoft, USA
A
Anjali Bhavan
Microsoft, USA
D
Divya Vermareddy
Microsoft, USA
Yunlei Lu
Yunlei Lu
Microsoft
OptimizationData Mining
S
Swati Bararia
Microsoft, USA
William Zhang
William Zhang
Carnegie Mellon University
DatabasesDatabase Systems
X
Xia Li
Microsoft, USA
K
Katherine Lin
Microsoft, USA
M
Miso Cilimdzic
Microsoft, USA
Subru Krishnan
Subru Krishnan
Microsoft
Big datadistributed systemscloud computingHadoop