Online Generic Event Boundary Detection

πŸ“… 2025-10-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the limitation of generic event boundary detection (GEBD)β€”its reliance on complete videos and inability to operate in real timeβ€”by proposing online GEBD (On-GEBD): a new task that detects category-agnostic, fine-grained event boundaries in streaming video using only historical frames. To tackle robust boundary discrimination without future-frame context, the authors introduce the Estimator framework, which integrates a Consistent Event Anticipation (CEA) module for frame-level prediction and an Online Boundary Discriminator (OBD) that dynamically identifies boundaries via error-statistical hypothesis testing and adaptive thresholding. This work is the first to adapt event segmentation theory to an online setting, significantly enhancing immediate perception of diverse, weak-signal event transitions in long videos. On Kinetics-GEBD and TAPOS benchmarks, On-GEBD outperforms all existing online baselines and approaches the performance of optimal offline methods.

Technology Category

Application Category

πŸ“ Abstract
Generic Event Boundary Detection (GEBD) aims to interpret long-form videos through the lens of human perception. However, current GEBD methods require processing complete video frames to make predictions, unlike humans processing data online and in real-time. To bridge this gap, we introduce a new task, Online Generic Event Boundary Detection (On-GEBD), aiming to detect boundaries of generic events immediately in streaming videos. This task faces unique challenges of identifying subtle, taxonomy-free event changes in real-time, without the access to future frames. To tackle these challenges, we propose a novel On-GEBD framework, Estimator, inspired by Event Segmentation Theory (EST) which explains how humans segment ongoing activity into events by leveraging the discrepancies between predicted and actual information. Our framework consists of two key components: the Consistent Event Anticipator (CEA), and the Online Boundary Discriminator (OBD). Specifically, the CEA generates a prediction of the future frame reflecting current event dynamics based solely on prior frames. Then, the OBD measures the prediction error and adaptively adjusts the threshold using statistical tests on past errors to capture diverse, subtle event transitions. Experimental results demonstrate that Estimator outperforms all baselines adapted from recent online video understanding models and achieves performance comparable to prior offline-GEBD methods on the Kinetics-GEBD and TAPOS datasets.
Problem

Research questions and friction points this paper is trying to address.

Detecting generic event boundaries in real-time streaming videos
Identifying subtle event changes without future frame access
Bridging gap between offline processing and human online perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Boundary Discriminator adaptively adjusts detection thresholds
Consistent Event Anticipator predicts future frames from prior information
Framework processes streaming videos without future frame access
πŸ”Ž Similar Papers
No similar papers found.