🤖 AI Summary
This study addresses the limitations of existing tremor detection methods, which rely heavily on expert assessments or handcrafted frequency-domain features and lack interpretable, data-driven modeling directly in the time domain. To overcome this, the authors propose a two-stage hierarchical framework that processes raw 3D motion marker time series entirely in the time domain: first, a CNN-LSTM module extracts short-term local features, followed by a Vision Transformer that models long-range temporal dynamics for trial-level classification. This approach represents the first time-domain, data-driven method capable of cross-body-part tremor detection, reducing dependence on spectral features while offering interpretability through self-attention mechanisms and Grad-CAM visualizations in both anatomical location and temporal dimensions. Evaluated across nine body parts, the method achieves an average F1 score of 0.765 (range: 0.594–0.947)—slightly below the best frequency-domain performance (0.909)—yet significantly streamlines preprocessing and enhances model transparency.
📝 Abstract
Tremor is a common movement disorder associated with conditions like Parkinson's disease and Essential tremor, traditionally diagnosed through expert clinician assessment. Current automated detection methods rely on frequency-domain features informed by clinical expertise. In this work, we present an explainable, two-stage hierarchical framework for tremor detection in the time domain that learns tremor patterns directly from 3D kinematic marker time-series data across entire tremor-provoking trials. Our framework combined a deep convolutional and long short-term memory network to learn tremor representations from short, discrete, non-overlapping time segments of kinematic time series data from trials, which are then processed by a vision transformer that models their long-term temporal dynamics of time segment features for trial (session) level classification. Evaluated across nine body parts, the framework achieved F1-scores of 0.594 - 0.947 depending on body parts (average: 0.765), falling short of the frequency-domain state-of-the-art performance (0.909) while requiring minimal preprocessing. Attention weights and gradient-based class activation maps (Grad-CAM) identified time-domain features of tremor across body parts. This proof of concept demonstrated the feasibility of data-driven time-domain modeling for tremor detection across anatomically diverse body parts, while reducing reliance on expert-engineered spectral features and providing posthoc interpretability of temporal and anatomical patterns of tremor.