CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos

๐Ÿ“… 2025-12-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing VR/VSR methods struggle to rectify severe structural distortions (e.g., facial/hand warping, background deformation) and motion incoherence prevalent in AI-generated and low-quality real-world videos, while diffusion-based restoration models lack controllable trade-offs between fidelity preservation and structural correction. To address this, we propose a diffusion-prior-guided video restoration framework: (1) a temporally consistent degradation module that explicitly models realistic structural failures for the first time; (2) a depth-adaptive architecture with a single-precision โ€œfidelity knobโ€ enabling continuous control over input fidelity versus structural/motion restoration capability; and (3) AIGC54โ€”the first benchmark dedicated to AIGC video restoration. Extensive evaluation across fidelity (FIQA), semantic consistency, and perceptual quality metrics demonstrates state-of-the-art performance on severely corrupted videos, competitive results on standard VR/VSR benchmarks, and real-time inference at 13 FPS on a single A100 GPU for 720p videos.

Technology Category

Application Category

๐Ÿ“ Abstract
Modern text-to-video (T2V) diffusion models can synthesize visually compelling clips, yet they remain brittle at fine-scale structure: even state-of-the-art generators often produce distorted faces and hands, warped backgrounds, and temporally inconsistent motion. Such severe structural artifacts also appear in very low-quality real-world videos. Classical video restoration and super-resolution (VR/VSR) methods, in contrast, are tuned for synthetic degradations such as blur and downsampling and tend to stabilize these artifacts rather than repair them, while diffusion-prior restorers are usually trained on photometric noise and offer little control over the trade-off between perceptual quality and fidelity. We introduce CreativeVR, a diffusion-prior-guided video restoration framework for AI-generated (AIGC) and real videos with severe structural and temporal artifacts. Our deep-adapter-based method exposes a single precision knob that controls how strongly the model follows the input, smoothly trading off between precise restoration on standard degradations and stronger structure- and motion-corrective behavior on challenging content. Our key novelty is a temporally coherent degradation module used during training, which applies carefully designed transformations that produce realistic structural failures. To evaluate AIGC-artifact restoration, we propose the AIGC54 benchmark with FIQA, semantic and perceptual metrics, and multi-aspect scoring. CreativeVR achieves state-of-the-art results on videos with severe artifacts and performs competitively on standard video restoration benchmarks, while running at practical throughput (about 13 FPS at 720p on a single 80-GB A100). Project page: https://daveishan.github.io/creativevr-webpage/.
Problem

Research questions and friction points this paper is trying to address.

Corrects distorted faces, hands, and backgrounds in AI-generated videos
Repairs severe structural and motion artifacts in real low-quality videos
Balances perceptual quality and fidelity in video restoration tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-prior-guided restoration for structural and motion artifacts
Deep-adapter-based method with a single precision control knob
Temporally coherent degradation module for realistic training transformations
๐Ÿ”Ž Similar Papers
No similar papers found.