🤖 AI Summary
This work addresses the limitations of existing reinforcement learning–based approaches to mechanical fault detection, which often reduce the problem to a contextual bandit setting, rely on handcrafted rewards and fault labels, and neglect sequential decision-making capabilities. The study introduces adversarial inverse reinforcement learning (AIRL) into this domain for the first time, leveraging offline learning to automatically infer an implicit reward function from trajectories of normal operation. This learned reward serves as an unsupervised anomaly scoring metric, eliminating the need for fault labels or manual reward design while fully exploiting the sequential dynamics of the system for early fault identification. Experimental results on three benchmark datasets—HUMS2023, IMS, and XJTU-SY—demonstrate that the proposed method consistently and sensitively discriminates between normal and faulty states.
📝 Abstract
Reinforcement learning (RL) offers significant promise for machinery fault detection (MFD). However, most existing RL-based MFD approaches do not fully exploit RL's sequential decision-making strengths, often treating MFD as a simple guessing game (Contextual Bandits). To bridge this gap, we formulate MFD as an offline inverse reinforcement learning problem, where the agent learns the reward dynamics directly from healthy operational sequences, thereby bypassing the need for manual reward engineering and fault labels. Our framework employs Adversarial Inverse Reinforcement Learning to train a discriminator that distinguishes between normal (expert) and policy-generated transitions. The discriminator's learned reward serves as an anomaly score, indicating deviations from normal operating behaviour. When evaluated on three run-to-failure benchmark datasets (HUMS2023, IMS, and XJTU-SY), the model consistently assigns low anomaly scores to normal samples and high scores to faulty ones, enabling early and robust fault detection. By aligning RL's sequential reasoning with MFD's temporal structure, this work opens a path toward RL-based diagnostics in data-driven industrial settings.