🤖 AI Summary
Addressing two key challenges in multimodal physiological signal (ECG/EEG) classification—difficult cross-modal modeling and severe class imbalance—this paper proposes a lightweight unified deep architecture. The method employs time-domain concatenation-based data augmentation to enhance signal diversity; integrates a ResNet backbone with dual channel-temporal attention mechanisms to enable cross-modal feature disentanglement and adaptive weighting; and incorporates wavelet denoising, baseline correction, and Focal Loss to improve training stability. Evaluated on three standard benchmarks, the model achieves accuracies of 99.96%, 99.78%, and 100%, respectively. With only 130 MB parameter size and ~10 ms inference latency per sample, it significantly outperforms existing approaches and is suitable for edge deployment on wearable devices.
📝 Abstract
The increasing need for accurate and unified analysis of diverse biological signals, such as ECG and EEG, is paramount for comprehensive patient assessment, especially in synchronous monitoring. Despite advances in multi-sensor fusion, a critical gap remains in developing unified architectures that effectively process and extract features from fundamentally different physiological signals. Another challenge is the inherent class imbalance in many biomedical datasets, often causing biased performance in traditional methods. This study addresses these issues by proposing a novel and unified deep learning framework that achieves state-of-the-art performance across different signal types. Our method integrates a ResNet-based CNN with an attention mechanism, enhanced by a novel data augmentation strategy: time-domain concatenation of multiple augmented variants of each signal to generate richer representations. Unlike prior work, we scientifically increase signal complexity to achieve future-reaching capabilities, which resulted in the best predictions compared to the state of the art. Preprocessing steps included wavelet denoising, baseline removal, and standardization. Class imbalance was effectively managed through the combined use of this advanced data augmentation and the Focal Loss function. Regularization techniques were applied during training to ensure generalization. We rigorously evaluated the proposed architecture on three benchmark datasets: UCI Seizure EEG, MIT-BIH Arrhythmia, and PTB Diagnostic ECG. It achieved accuracies of 99.96%, 99.78%, and 100%, respectively, demonstrating robustness across diverse signal types and clinical contexts. Finally, the architecture requires ~130 MB of memory and processes each sample in ~10 ms, suggesting suitability for deployment on low-end or wearable devices.