🤖 AI Summary
This work proposes Anon-NET, a unified framework designed to preserve essential attributes—such as facial expression, head pose, age, gender, and ethnicity—for downstream visual tasks while effectively anonymizing identity in face videos. Anon-NET is the first approach to integrate diffusion generative models with video-driven facial animation, leveraging high-level attribute recognition and motion-aware expression transfer to achieve strong identity obfuscation without compromising expression consistency, visual realism, or temporal coherence. Extensive experiments on VoxCeleb2, CelebV-HQ, and HDTF datasets demonstrate that Anon-NET significantly outperforms existing methods in both visual quality and temporal stability while successfully concealing identity information.
📝 Abstract
Face video anonymization is aimed at privacy preservation while allowing for the analysis of videos in a number of computer vision downstream tasks such as expression recognition, people tracking, and action recognition. We propose here a novel unified framework referred to as Anon-NET, streamlined to de-identify facial videos, while preserving age, gender, race, pose, and expression of the original video. Specifically, we inpaint faces by a diffusion-based generative model guided by high-level attribute recognition and motion-aware expression transfer. We then animate deidentified faces by video-driven animation, which accepts the de-identified face and the original video as input. Extensive experiments on the datasets VoxCeleb2, CelebV-HQ, and HDTF, which include diverse facial dynamics, demonstrate the effectiveness of AnonNET in obfuscating identity while retaining visual realism and temporal consistency. The code of AnonNet will be publicly released.