Mahalanobis-Guided Latent OOD Detection for Hybrid ES-DRL Control in Time-Varying Systems

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the significant performance degradation of deep reinforcement learning (DRL) controllers in time-varying nonlinear systems when encountering out-of-distribution (OOD) observations. To mitigate this issue, the authors propose a hybrid ES-DRL control architecture that leverages DRL for efficient control within the distribution and seamlessly switches to a robust extremum seeking (ES) controller upon OOD detection. A key innovation lies in introducing the Mahalanobis distance into the latent space of a variational autoencoder (VAE) to enable real-time, interpretable detection of OOD beam profiles. Evaluated on particle accelerator control tasks, the method successfully identifies previously unseen beam shapes induced by magnet movements and reliably triggers controller switching, thereby ensuring safe and stable system operation.

📝 Abstract

In this paper, we study Mahalanobis-guided latent out-of-distribution (OOD) detection for test-time RL controller switching in nonlinear time-varying systems. RL controllers can quickly control high-dimensional systems within the training distribution, but their performance can degrade when time-varying dynamics produce unseen observations. We consider a combined ES--DRL controller, where RL provides fast in-distribution actions and bounded extremum seeking (ES) provides robust model-independent control under OOD operation. The key challenge is deciding when to switch. We train a variational autoencoder (VAE) on in-distribution beam-profile observations and use Mahalanobis distance in the VAE latent space to detect OOD beam profiles at test time. This OOD decision sets a binary switch that selects either the RL controller or the ES controller. We evaluate the approach in safety-critical particle accelerator control. In this setting, spatial magnet motion creates OOD beam profiles that were not seen during RL training. Visualization of the VAE latent space shows that the proposed method identifies this OOD scenario and provides an interpretable signal for switching between RL and ES in the combined controller.

Problem

Research questions and friction points this paper is trying to address.

out-of-distribution detection

reinforcement learning

time-varying systems

controller switching

Mahalanobis distance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mahalanobis distance

latent OOD detection

hybrid ES-DRL control