Synthetic Human Action Video Data Generation with Pose Transfer

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-fidelity synthetic human motion data in video understanding induces the “uncanny valley” effect, severely limiting model generalization. To address this, we propose a controllable 3D Gaussian avatar-based method for motion video synthesis. We pioneer the integration of drivable 3D Gaussian splatting into pose transfer frameworks, enabling high-fidelity and temporally coherent human motion generation. Furthermore, we introduce cross-domain background fusion and few-shot augmentation to enhance background diversity and improve coverage of long-tail action categories. We release the RANDOM People dataset—a large-scale, identity-pose disentangled benchmark supporting few-shot extension. Extensive experiments on Toyota Smarthome and NTU RGB+D demonstrate significant improvements in action recognition accuracy, alongside enhanced model robustness and generalization capability across unseen domains and rare classes.

Technology Category

Application Category

📝 Abstract
In video understanding tasks, particularly those involving human motion, synthetic data generation often suffers from uncanny features, diminishing its effectiveness for training. Tasks such as sign language translation, gesture recognition, and human motion understanding in autonomous driving have thus been unable to exploit the full potential of synthetic data. This paper proposes a method for generating synthetic human action video data using pose transfer (specifically, controllable 3D Gaussian avatar models). We evaluate this method on the Toyota Smarthome and NTU RGB+D datasets and show that it improves performance in action recognition tasks. Moreover, we demonstrate that the method can effectively scale few-shot datasets, making up for groups underrepresented in the real training data and adding diverse backgrounds. We open-source the method along with RANDOM People, a dataset with videos and avatars of novel human identities for pose transfer crowd-sourced from the internet.
Problem

Research questions and friction points this paper is trying to address.

Synthetic human action videos often have unrealistic features
Limited use of synthetic data in motion-related tasks
Need for scalable few-shot datasets with diverse backgrounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses controllable 3D Gaussian avatar models
Improves action recognition performance
Scales few-shot datasets effectively
🔎 Similar Papers
No similar papers found.