FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

Video diffusion models (VDMs) often suffer from motion inconsistency due to insufficient temporal modeling. To address this, we propose FlowLoss—a novel explicit optical flow matching loss that directly compares RAFT-estimated flow fields between generated and ground-truth videos, departing from conventional deformation-based implicit flow guidance. Furthermore, we introduce a noise-aware dynamic weighting mechanism that adaptively modulates the strength of optical flow supervision according to the noise level during the denoising process. Our method requires no auxiliary networks or post-processing modules. Evaluated on robotic video datasets, it significantly improves temporal motion consistency and accelerates training convergence. The core contribution lies in the first integration of explicit optical flow matching with noise-dependent dynamic weighting into a unified loss function—establishing a new paradigm for temporal modeling in VDMs.

Technology Category

Application Category

📝 Abstract

Video Diffusion Models (VDMs) can generate high-quality videos, but often struggle with producing temporally coherent motion. Optical flow supervision is a promising approach to address this, with prior works commonly employing warping-based strategies that avoid explicit flow matching. In this work, we explore an alternative formulation, FlowLoss, which directly compares flow fields extracted from generated and ground-truth videos. To account for the unreliability of flow estimation under high-noise conditions in diffusion, we propose a noise-aware weighting scheme that modulates the flow loss across denoising steps. Experiments on robotic video datasets suggest that FlowLoss improves motion stability and accelerates convergence in early training stages. Our findings offer practical insights for incorporating motion-based supervision into noise-conditioned generative models.

Problem

Research questions and friction points this paper is trying to address.

Improving temporal coherence in video diffusion models

Addressing unreliable flow estimation in noisy conditions

Enhancing motion stability and training convergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly compares flow fields from generated and real videos

Noise-aware weighting modulates flow loss across steps

Improves motion stability and early training convergence

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling