A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

To address common challenges in rolling bearing remaining useful life (RUL) prediction—including poor generalization, weak robustness, high data dependency, and limited interpretability—this paper proposes a multimodal deep learning framework integrating vibration-derived grayscale images and time-frequency representations. Methodologically, Bresenham’s line algorithm is employed to convert raw vibration signals into grayscale images, while continuous wavelet transform generates complementary time-frequency representations. A dual-branch CNN-LSTM architecture with multi-head self-attention is designed to jointly extract spatial degradation features and model temporal evolution. Furthermore, Multimodal Layer-wise Relevance Propagation (Multimodal-LRP) is introduced to enable cross-modal, interpretable feature attribution. Evaluated on the XJTU-SY and PRONOSTIA datasets, the framework achieves state-of-the-art RUL prediction accuracy, reduces required training samples by 28% and 48%, respectively, and demonstrates significantly enhanced noise robustness. It thus delivers high accuracy, low data dependency, strong generalization, and intrinsic interpretability—rendering it suitable for industrial deployment.

Technology Category

Application Category

📝 Abstract

Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machinery failure, highlighting the need for robust RUL estimation methods. Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability. This paper proposes a novel multimodal-RUL framework that jointly leverages image representations (ImR) and time-frequency representations (TFR) of multichannel, nonstationary vibration signals. The architecture comprises three branches: (1) an ImR branch and (2) a TFR branch, both employing multiple dilated convolutional blocks with residual connections to extract spatial degradation features; and (3) a fusion branch that concatenates these features and feeds them into an LSTM to model temporal degradation patterns. A multi-head attention mechanism subsequently emphasizes salient features, followed by linear layers for final RUL regression. To enable effective multimodal learning, vibration signals are converted into ImR via the Bresenham line algorithm and into TFR using Continuous Wavelet Transform. We also introduce multimodal Layer-wise Relevance Propagation (multimodal-LRP), a tailored explainability technique that significantly enhances model transparency. The approach is validated on the XJTU-SY and PRONOSTIA benchmark datasets. Results show that our method matches or surpasses state-of-the-art baselines under both seen and unseen operating conditions, while requiring ~28 % less training data on XJTU-SY and ~48 % less on PRONOSTIA. The model exhibits strong noise resilience, and multimodal-LRP visualizations confirm the interpretability and trustworthiness of predictions, making the framework highly suitable for real-world industrial deployment.

Problem

Research questions and friction points this paper is trying to address.

Estimates remaining useful life of mechanical systems like bearings

Addresses poor generalization, robustness, and interpretability in existing methods

Uses multimodal data and explainable AI for reliable industrial deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal framework uses image and time-frequency representations

Architecture combines dilated CNNs, LSTM, and attention mechanisms

Introduces multimodal LRP for enhanced model interpretability

🔎 Similar Papers

No similar papers found.