🤖 AI Summary
This work addresses the risk of identity impersonation through AI-generated synthetic avatars by proposing a method to trace whether two avatar videos originate from the same real person. To this end, we introduce AVAPrintDB, the first high-fidelity synthetic avatar database encompassing three state-of-the-art generators—GAGAvatar, LivePortrait, and HunyuanPortrait—and establish a standardized benchmark for avatar fingerprinting that covers both within-generator and cross-generator scenarios. Leveraging foundation models such as DINOv2 and CLIP, we design a novel fingerprinting approach that validates the persistence of identity-related motion cues in synthesized avatars. Our analysis further reveals that existing methods are highly sensitive to variations in generation pipelines and data domains.
📝 Abstract
Recent advances in photorealistic avatar generation have enabled highly realistic talking-head avatars, raising security concerns regarding identity impersonation in AI-mediated communication. To advance in this challenging problem, the task of avatar fingerprinting aims to determine whether two avatar videos are driven by the same human operator or not. However, current public databases in the literature are scarce and based solely on old-fashioned talking-head avatar generators, not representing realistic scenarios for the current task of avatar fingerprinting. To overcome this situation, the present article introduces AVAPrintDB, a new publicly available multi-generator talking-head avatar database for avatar fingerprinting. AVAPrintDB is constructed from two audiovisual corpora and three state-of-the-art avatar generators (GAGAvatar, LivePortrait, HunyuanPortrait), representing different synthesis paradigms, and includes both self- and cross-reenactments to simulate legitimate usage and impersonation scenarios.
Building on this database, we also define a standardized and reproducible benchmark for avatar fingerprinting, considering public state-of-the-art avatar fingerprinting systems and exploring novel methods based on Foundation Models (DINOv2 and CLIP). Also, we conduct a comprehensive analysis under generator and dataset shift. Our results show that, while identity-related motion cues persist across synthetic avatars, current avatar fingerprinting systems remain highly sensitive to changes in the synthesis pipeline and source domain. The AVAPrintDB, benchmark protocols, and avatar fingerprinting systems are publicly available to facilitate reproducible research.