Context-Aware Online Conformal Anomaly Detection with Prediction-Powered Data Acquisition

📅 2025-05-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Online anomaly detection requires strict control of the false discovery rate (FDR), yet existing methods rely heavily on abundant, continuously available ground-truth calibration data—unrealistic in practice due to scarcity of labeled calibration samples. To address this, we propose a context-aware, prediction-driven calibration framework that jointly introduces synthetic calibration data generation and an adaptive online data selection mechanism. Our approach integrates conformal p-values, active p-value statistics, and an online FDR control algorithm, achieving a principled trade-off between data efficiency and theoretical guarantees. Empirically, it reduces dependence on real calibration data by over 70%, strictly maintains the prescribed FDR bound across diverse benchmark datasets, and sustains high detection sensitivity and stability. This work delivers the first solution for trustworthy online anomaly detection under resource constraints that simultaneously satisfies rigorous statistical guarantees and practical deployability.

Technology Category

Application Category

📝 Abstract
Online anomaly detection is essential in fields such as cybersecurity, healthcare, and industrial monitoring, where promptly identifying deviations from expected behavior can avert critical failures or security breaches. While numerous anomaly scoring methods based on supervised or unsupervised learning have been proposed, current approaches typically rely on a continuous stream of real-world calibration data to provide assumption-free guarantees on the false discovery rate (FDR). To address the inherent challenges posed by limited real calibration data, we introduce context-aware prediction-powered conformal online anomaly detection (C-PP-COAD). Our framework strategically leverages synthetic calibration data to mitigate data scarcity, while adaptively integrating real data based on contextual cues. C-PP-COAD utilizes conformal p-values, active p-value statistics, and online FDR control mechanisms to maintain rigorous and reliable anomaly detection performance over time. Experiments conducted on both synthetic and real-world datasets demonstrate that C-PP-COAD significantly reduces dependency on real calibration data without compromising guaranteed FDR control.
Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on real calibration data for anomaly detection
Ensuring false discovery rate control with limited real data
Integrating synthetic and real data for adaptive anomaly detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses synthetic calibration data strategically
Integrates real data adaptively with context
Employs conformal p-values and online FDR control
🔎 Similar Papers
2024-05-29arXiv.orgCitations: 0