PerformRecast: Expression and Head Pose Disentanglement for Portrait Video Editing

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of disentangling facial expressions from head poses in portrait video editing, a limitation that hinders fine-grained expression control. To this end, we propose PerformRecast, the first method to explicitly leverage the parameter disentanglement property of 3D Morphable Face Models (3DMMs) in video editing. By refining the landmark transformation formulation to better align with 3DMM geometry and introducing a region-wise disentanglement supervision mechanism that separately trains facial and non-facial regions, our approach enables precise expression transfer from a driving video while preserving the original head pose. Integrated within a teacher-student pretraining framework, PerformRecast achieves superior performance in generation fidelity, controllability, and efficiency compared to existing methods, enabling high-quality, fine-grained facial expression editing.

Technology Category

Application Category

📝 Abstract
This paper primarily investigates the task of expression-only portrait video performance editing based on a driving video, which plays a crucial role in animation and film industries. Most existing research mainly focuses on portrait animation, which aims to animate a static portrait image according to the facial motion from the driving video. As a consequence, it remains challenging for them to disentangle the facial expression from head pose rotation and thus lack the ability to edit facial expression independently. In this paper, we propose PerformRecast, a versatile expression-only video editing method which is dedicated to recast the performance in existing film and animation. The key insight of our method comes from the characteristics of 3D Morphable Face Model (3DMM), which models the face identity, facial expression and head pose of 3D face mesh with separate parameters. Therefore, we improve the keypoints transformation formula in previous methods to make it more consistent with 3DMM model, which achieves a better disentanglement and provides users with much more fine-grained control. Furthermore, to avoid the misalignment around the boundary of face in generated results, we decouple the facial and non-facial regions of input portrait images and pre-train a teacher model to provide separate supervision for them. Extensive experiments show that our method produces high-quality results which are more faithful to the driving video, outperforming existing methods in both controllability and efficiency. Our code, data and trained models are available at https://youku-aigc.github.io/PerformRecast.
Problem

Research questions and friction points this paper is trying to address.

expression disentanglement
head pose
portrait video editing
facial animation
performance editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

expression disentanglement
head pose
3DMM
portrait video editing
teacher-student supervision
🔎 Similar Papers
No similar papers found.