CFRNet: Cycle-Consistent Fixed-Point Training for Real-Time Blind Face Restoration on Consumer Embedded NPUs

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing blind face restoration methods struggle to simultaneously achieve high image quality, fast inference speed, and memory efficiency on consumer-grade embedded NPUs. To address this challenge, this work proposes CFRNet—a lightweight ResNet architecture tailored for 256×256 face restoration—augmented with a novel cyclically consistent fixed-point (CCFP) training strategy. CCFP leverages multi-cycle supervision, idempotent loss, and re-degradation cycle loss to enhance output stability and enable self-regulated quality refinement without incurring additional inference overhead. Combined with INT8 quantization and lightweight convolution design, the model achieves a single-pass inference latency of only 23 ms on the HiSilicon Hi3402 NPU. With three iterative passes, it reduces LPIPS to 0.250—a 31% improvement over the single-pass result—and has been successfully deployed in real-time within an in-vehicle driver monitoring system.

📝 Abstract

Blind face restoration on consumer devices has to balance image quality against speed and memory. Strong methods such as GFPGAN and CodeFormer give good perceptual quality, but they rely on large pretrained generative priors and on operators such as attention, codebook lookup, and style modulation that are hard to compile and quantize on the small neural processing units (NPUs) used in consumer hardware. Small convolutional restorers run fast enough, but they tend to over-smooth and to leave artifacts around the eyes, nose, and mouth. We present CFRNet, a 2.0,M-parameter ResNet-style restorer for on-device use at $256\times256$, the common face-crop size on consumer NPUs. The main idea is Cycle-Consistent Fixed-Point Training (CCFP). Instead of training the network for one pass and then running it several times by hand, we train it to act as a fixed-point operator, so that applying it again to a restored face does not change the face. CCFP uses three training losses, namely progressive multi-cycle supervision, an idempotence loss, and a re-degradation cycle loss, and it adds no cost at inference. To compare fairly under our deployment limits, we retrain all baselines from scratch at the same $256\times256$ resolution. On a 300-image test set, CFRNet reaches the best perceptual score (LPIPS 0.250 at three cycles, which is 31% lower than one cycle) and also the best PSNR and SSIM at two cycles. It runs in about 23,ms per cycle in INT8 on a HiSilicon Hi3402 NPU, while the same baselines cannot be compiled to that chip. The cycle count $k$ acts as a simple quality knob that needs no retraining: PSNR is best at $k\!=\!2$ and LPIPS keeps improving up to $k\!=\!3$. We further show that the same idea works with a plain CNN that is even easier to deploy, and we run the model in real time on an in-car driver-monitoring board.

Problem

Research questions and friction points this paper is trying to address.

blind face restoration

consumer embedded NPUs

real-time inference

model quantization

image quality-speed trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cycle-Consistent Fixed-Point Training

Blind Face Restoration

On-Device NPU Deployment