FRAG: Frequency Adapting Group for Diffusion Video Editing

📅 2024-06-10

🏛️ International Conference on Machine Learning

📈 Citations: 5

✨ Influential: 0

🤖 AI Summary

High-frequency information leakage in diffusion-based video editing causes blurring and flickering, severely degrading temporal consistency and visual fidelity. To address this, we propose FRAG—a plug-and-play, training-free module that introduces a novel frequency-adaptive grouping mechanism, explicitly modeling high-frequency fidelity as an independent, modular, parallel branch. Guided by frequency-domain analysis, FRAG performs spatial feature fusion during denoising to explicitly enhance high-frequency details. Compatible with any UNet architecture, FRAG incurs zero training overhead and is model-agnostic. On the TGVE and DAVIS benchmarks, FRAG achieves gains of +2.1 dB in PSNR and +0.032 in SSIM over prior methods, attains state-of-the-art temporal consistency, and significantly suppresses flickering and blurring.

Technology Category

Application Category

📝 Abstract

In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our primary focus is on overcoming the current challenges in high quality edit to ensure that each edit enhances the final product without disrupting its intended essence. However, quality deterioration such as blurring and flickering is routinely observed in recent diffusion video editing systems. We confirm that this deterioration often stems from high-frequency leak: the diffusion model fails to accurately synthesize high-frequency components during denoising process. To this end, we devise Frequency Adapting Group (FRAG) which enhances the video quality in terms of consistency and fidelity by introducing a novel receptive field branch to preserve high-frequency components during the denoising process. FRAG is performed in a model-agnostic manner without additional training and validates the effectiveness on video editing benchmarks (i.e., TGVE, DAVIS).

Problem

Research questions and friction points this paper is trying to address.

Overcoming challenges in high-quality video editing consistency

Addressing blurring and flickering in diffusion video editing

Preserving high-frequency components during denoising for fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency Adapting Group enhances video quality

Novel receptive field branch preserves high-frequency

Model-agnostic approach without additional training

🔎 Similar Papers

No similar papers found.

Authors to Follow