🤖 AI Summary
In blind face restoration (BFR), diffusion-based methods suffer from prior–degradation mismatch, arising from discrepancies between high- and low-quality image distributions and shifts between synthetic and real degradation distributions. To address this, we propose FLIPNET, a unified dual-mode network. Its core innovation lies in two complementary branches: (1) a restoration mode jointly leverages BFR-specific features and identity-aware facial embeddings to enhance structural integrity and identity fidelity; and (2) a degradation mode explicitly models real-world degradations using authentic degradation data, thereby bridging the gap between synthetic and realistic degradation priors. FLIPNET employs a switchable dual-path architecture that integrates diffusion priors, facial identity encoding, and data-driven degradation modeling. Extensive experiments on multiple benchmarks demonstrate that FLIPNET significantly outperforms existing diffusion-based BFR approaches, achieving state-of-the-art performance in restoration realism, fine-grained detail preservation, and degradation modeling accuracy.
📝 Abstract
Although diffusion prior is rising as a powerful solution for blind face restoration (BFR), the inherent gap between the vanilla diffusion model and BFR settings hinders its seamless adaptation. The gap mainly stems from the discrepancy between 1) high-quality (HQ) and low-quality (LQ) images and 2) synthesized and real-world images. The vanilla diffusion model is trained on images with no or less degradations, whereas BFR handles moderately to severely degraded images. Additionally, LQ images used for training are synthesized by a naive degradation model with limited degradation patterns, which fails to simulate complex and unknown degradations in real-world scenarios. In this work, we use a unified network FLIPNET that switches between two modes to resolve specific gaps. In Restoration mode, the model gradually integrates BFR-oriented features and face embeddings from LQ images to achieve authentic and faithful face restoration. In Degradation mode, the model synthesizes real-world like degraded images based on the knowledge learned from real-world degradation datasets. Extensive evaluations on benchmark datasets show that our model 1) outperforms previous diffusion prior based BFR methods in terms of authenticity and fidelity, and 2) outperforms the naive degradation model in modeling the real-world degradations.