TrAction: Action Recognition with Sparse Trajectories

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

171K/year
🤖 AI Summary
This work addresses the susceptibility of existing action recognition models to appearance- and background-based shortcuts by advocating sparse point trajectories as an unbiased input modality. The study systematically demonstrates, for the first time, the complementarity between such trajectory representations and state-of-the-art appearance features. To effectively leverage this modality, the authors introduce a 2.5D trajectory Transformer architecture together with a masked trajectory pretraining strategy. The proposed method achieves top-1 accuracies of 45% on Something-Something V2 and 54% on EPIC-Kitchens-100; when fused with DINOv2 or V-JEPA features, performance improves by 8.7 percentage points. Furthermore, the approach exhibits superior sensitivity to temporal reversal compared to V-JEPA, significantly enhancing both robustness and accuracy in action recognition.
📝 Abstract
Modern action recognition models operate on memory- and compute-intensive dense RGB video volumes and frequently exploit appearance and background shortcuts, for example, predicting actions from objects or scenes instead of characteristic motion. We investigate an efficient alternative input modality that is largely free of such biases by construction: sparse point trajectories. To this end, we develop a simple transformer architecture for 2.5D trajectory-based recognition together with a masked-trajectory pretraining, which we show to substantially improve downstream action recognition accuracy. Despite using only a fraction of the dense RGB input, our method reaches 45% top-1 on Something-Something V2 and 54% on EPIC-Kitchens-100, and surpasses V-JEPA on time-reversal sensitivity. More importantly, we find trajectory features to be complementary to state-of-the-art appearance-based features. Fusing our pretrained model with DINOv2 and V-JEPA 2 improves top-1 accuracy on Something-Something V2 by 8.7 and 1.6 points, respectively. Code: https://github.com/ecker-lab/TrAction
Problem

Research questions and friction points this paper is trying to address.

action recognition
sparse trajectories
appearance bias
motion modeling
video understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse trajectories
action recognition
masked pretraining
transformer architecture
motion bias
J
Jan F. Meier
Institute of Computer Science and Campus Institute Data Science, University Göttingen, Germany
F
Felix B. Müller
Institute of Computer Science and Campus Institute Data Science, University Göttingen, Germany
A
Alexander Ecker
Institute of Computer Science and Campus Institute Data Science, University Göttingen, Germany; Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
Timo Lüddecke
Timo Lüddecke
University of Göttingen and Campus Institute Data Science (CIDAS)