HRTR: A Single-stage Transformer for Fine-grained Sub-second Action Segmentation in Stroke Rehabilitation

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of fine-grained, sub-second (<1 s) action detection in stroke rehabilitation, this paper proposes the High-Resolution Temporal Transformer (HRTR)—the first single-stage, end-to-end framework for temporal action localization and classification. HRTR models millisecond-level temporal dynamics via self-attention, incorporates high-density temporal step embeddings, and jointly optimizes frame-wise classification and boundary regression—eliminating conventional multi-stage pipelines and post-processing. On StrokeRehab Video, StrokeRehab IMU, and 50Salads datasets, HRTR achieves Edit Scores of 70.1, 69.4, and 88.4, respectively, surpassing all state-of-the-art methods. Its core contribution lies in the first direct, single-stage modeling of sub-second action boundaries, significantly improving both temporal precision and inference efficiency.

Technology Category

Application Category

📝 Abstract
Stroke rehabilitation often demands precise tracking of patient movements to monitor progress, with complexities of rehabilitation exercises presenting two critical challenges: fine-grained and sub-second (under one-second) action detection. In this work, we propose the High Resolution Temporal Transformer (HRTR), to time-localize and classify high-resolution (fine-grained), sub-second actions in a single-stage transformer, eliminating the need for multi-stage methods and post-processing. Without any refinements, HRTR outperforms state-of-the-art systems on both stroke related and general datasets, achieving Edit Score (ES) of 70.1 on StrokeRehab Video, 69.4 on StrokeRehab IMU, and 88.4 on 50Salads.
Problem

Research questions and friction points this paper is trying to address.

Fine-grained sub-second action segmentation in stroke rehabilitation
Single-stage transformer for precise movement tracking
Eliminating multi-stage methods and post-processing needs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-stage transformer for action segmentation
High-resolution sub-second action detection
Outperforms state-of-the-art without refinements
🔎 Similar Papers
No similar papers found.