🤖 AI Summary
To address the insufficient class separability of millimeter-wave FMCW radar micro-Doppler spectrograms under target occlusion and overlap, this paper proposes the Temporal Micro-Doppler Spectrogram Transformer (T-MDST). T-MDST models the Range–Velocity–Angle (RVA) tensor as a spatiotemporal sequence and introduces cross-axis attention alongside motion-aware constraints to explicitly capture multidimensional motion coupling. Leveraging patch embedding and interpretable attention visualization, it focuses on discriminative dynamic regions. Compared to CNN-based baselines, T-MDST achieves significant improvements in classification accuracy on public radar datasets, while demonstrating superior data efficiency and real-time inference capability—making it well-suited for low-power, high-responsiveness embedded radar perception systems.
📝 Abstract
In this paper, we propose a new Temporal MDS-Vision Transformer (T-MDS-ViT) for multiclass target classification using millimeter-wave FMCW radar micro-Doppler spectrograms. Specifically, we design a transformer-based architecture that processes stacked range-velocity-angle (RVA) spatiotemporal tensors via patch embeddings and cross-axis attention mechanisms to explicitly model the sequential nature of MDS data across multiple frames. The T-MDS-ViT exploits mobility-aware constraints in its attention layer correspondences to maintain separability under target overlaps and partial occlusions. Next, we apply an explainable mechanism to examine how the attention layers focus on characteristic high-energy regions of the MDS representations and their effect on class-specific kinematic features. We also demonstrate that our proposed framework is superior to existing CNN-based methods in terms of classification accuracy while achieving better data efficiency and real-time deployability.