🤖 AI Summary
This study addresses the critical need for reliable urban traffic management that jointly ensures accurate prediction, effective anomaly detection, and provably safe control. The authors propose STREAM-RL, a unified framework that, for the first time, propagates calibrated uncertainty end-to-end from prediction through anomaly detection to safety-aware policy learning, with formal theoretical guarantees. Key innovations include three novel algorithms: an uncertainty-guided graph attention network (PU-GAT+), a conformal residual flow network with Benjamini–Yekutieli false discovery rate (FDR) control (CRFN-BY), and a safe world model-based reinforcement learning method (LyCon-WRL+) equipped with Lyapunov stability certificates and Lipschitz bounds. Evaluated on real-world traffic data, the approach achieves 91.4% coverage efficiency, 4.1% FDR control, and a 95.2% safety rate—26.2 percentage points higher than PPO—while delivering higher rewards and maintaining an end-to-end inference latency of only 23 ms.
📝 Abstract
Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions -- all while providing reliability guarantees. We present STREAM-RL, a unified framework that introduces three novel algorithmic contributions: (1) PU-GAT+, an Uncertainty-Guided Adaptive Conformal Forecaster that uses prediction uncertainty to dynamically reweight graph attention via confidence-monotonic attention, achieving distribution-free coverage guarantees; (2) CRFN-BY, a Conformal Residual Flow Network that models uncertainty-normalized residuals via normalizing flows with Benjamini-Yekutieli FDR control under arbitrary dependence; and (3) LyCon-WRL+, an Uncertainty-Guided Safe World-Model RL agent with Lyapunov stability certificates, certified Lipschitz bounds, and uncertainty-propagated imagination rollouts. To our knowledge, this is the first framework to propagate calibrated uncertainty from forecasting through anomaly detection to safe policy learning with end-to-end theoretical guarantees. Experiments on multiple real-world traffic trajectory data demonstrate that STREAM-RL achieves 91.4\% coverage efficiency, controls FDR at 4.1\% under verified dependence, and improves safety rate to 95.2\% compared to 69\% for standard PPO while achieving higher reward, with 23ms end-to-end inference latency.