Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses three core tasks in sports video analysis: Action Segmentation (AS), Action Recognition, and Precise Event Detection (PES). Methodologically, it unifies task definitions and evaluation protocols while proposing a lightweight, efficient detection framework that fuses visual-audio multimodal features, employs spatiotemporal Transformers for long-range temporal modeling, and leverages self-supervised pretraining and knowledge distillation for model compression—enhanced further by cross-sport transfer learning to improve generalization. Key contributions include: (1) the first rigorous formalization of AS and PES task boundaries; (2) construction of a comprehensive sports event dataset taxonomy covering 12 major sports and a standardized benchmark suite; (3) systematic analysis of the accuracy–latency–generalization trade-off; and (4) a reproducible, general-purpose sports event detection pipeline that significantly advances automation and broadcast efficiency.

Technology Category

Application Category

📝 Abstract
Video event detection has become an essential component of sports analytics, enabling automated identification of key moments and enhancing performance analysis, viewer engagement, and broadcast efficiency. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs) and Transformers, have significantly improved accuracy and efficiency in Temporal Action Localization (TAL), Action Spotting (AS), and Precise Event Spotting (PES). This survey provides a comprehensive overview of these three key tasks, emphasizing their differences, applications, and the evolution of methodological approaches. We thoroughly review and categorize existing datasets and evaluation metrics specifically tailored for sports contexts, highlighting the strengths and limitations of each. Furthermore, we analyze state-of-the-art techniques, including multi-modal approaches that integrate audio and visual information, methods utilizing self-supervised learning and knowledge distillation, and approaches aimed at generalizing across multiple sports. Finally, we discuss critical open challenges and outline promising research directions toward developing more generalized, efficient, and robust event detection frameworks applicable to diverse sports. This survey serves as a foundation for future research on efficient, generalizable, and multi-modal sports event detection.
Problem

Research questions and friction points this paper is trying to address.

Automated identification of key moments in sports videos
Improving accuracy in temporal action and event spotting
Developing generalized multi-modal event detection frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses CNNs and Transformers for action detection
Integrates multi-modal audio-visual information
Employs self-supervised learning and knowledge distillation
🔎 Similar Papers
No similar papers found.