🤖 AI Summary
Existing online early classification methods for real-time video analysis lack rigorous theoretical foundations.
Method: This paper proposes the first probabilistic online adaptation framework for early stopping on sequential data. It establishes the first formal mathematical theory for sequence-level early stopping and introduces a plug-and-play, lightweight online adaptation mechanism for pre-trained offline models—integrating Bayesian updating with confidence-based gating—to jointly ensure theoretical soundness and engineering practicality.
Results: Evaluated on multiple video action recognition and anomaly detection benchmarks, the framework maintains state-of-the-art (SOTA) offline model accuracy while significantly reducing average decision latency—demonstrating superior early prediction capability, effectiveness, and cross-domain generalizability.
📝 Abstract
Video processing is generally divided into two main categories: processing of the entire video, which typically yields optimal classification outcomes, and real-time processing, where the objective is to make a decision as promptly as possible. The latter is often driven by the need to identify rapidly potential critical or dangerous situations. These could include machine failure, traffic accidents, heart problems, or dangerous behavior. Although the models dedicated to the processing of entire videos are typically well-defined and clearly presented in the literature, this is not the case for online processing, where a plethora of hand-devised methods exist. To address this, we present our{}, a novel, unified, and theoretically-based adaptation framework for dealing with the online classification problem for video data. The initial phase of our study is to establish a robust mathematical foundation for the theory of classification of sequential data, with the potential to make a decision at an early stage. This allows us to construct a natural function that encourages the model to return an outcome much faster. The subsequent phase is to demonstrate a straightforward and readily implementable method for adapting offline models to online and recurrent operations. Finally, by comparing the proposed approach to the non-online state-of-the-art baseline, it is demonstrated that the use of our{} encourages the network to make earlier classification decisions without compromising accuracy.