Learning from Streaming Video with Orthogonal Gradients

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation in self-supervised learning on continuous video streams—caused by violations of the independent and identically distributed (IID) assumption—this paper proposes an Orthogonal Gradient Optimization (OGO) mechanism. OGO explicitly decouples gradient correlations across consecutive batches at the optimizer level, without modifying model architecture or data preprocessing. Grounded in geometric principles, it enhances SGD and AdamW by incorporating orthogonal projection into gradient updates, ensuring theoretical rigor and implementation simplicity. We systematically evaluate OGO on three representative tasks: DoRA, VideoMAE, and future frame prediction. Results demonstrate substantial mitigation of performance drop under streaming training, with average downstream task improvements of 2.1–4.7 percentage points over standard AdamW. OGO establishes an efficient, general-purpose, and plug-and-play optimization paradigm for learning temporally coherent video representations.

Technology Category

Application Category

📝 Abstract
We address the challenge of representation learning from a continuous stream of video as input, in a self-supervised manner. This differs from the standard approaches to video learning where videos are chopped and shuffled during training in order to create a non-redundant batch that satisfies the independently and identically distributed (IID) sample assumption expected by conventional training paradigms. When videos are only available as a continuous stream of input, the IID assumption is evidently broken, leading to poor performance. We demonstrate the drop in performance when moving from shuffled to sequential learning on three tasks: the one-video representation learning method DoRA, standard VideoMAE on multi-video datasets, and the task of future video prediction. To address this drop, we propose a geometric modification to standard optimizers, to decorrelate batches by utilising orthogonal gradients during training. The proposed modification can be applied to any optimizer -- we demonstrate it with Stochastic Gradient Descent (SGD) and AdamW. Our proposed orthogonal optimizer allows models trained from streaming videos to alleviate the drop in representation learning performance, as evaluated on downstream tasks. On three scenarios (DoRA, VideoMAE, future prediction), we show our orthogonal optimizer outperforms the strong AdamW in all three scenarios.
Problem

Research questions and friction points this paper is trying to address.

Learning representations from continuous video streams without IID data
Addressing performance drop in sequential vs shuffled video learning
Decorrelating batches using orthogonal gradients for better optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal gradients decorrelate video batches
Geometric modification to standard optimizers
SGD and AdamW enhanced for streaming video
🔎 Similar Papers
No similar papers found.