One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfe

๐Ÿ“… 2025-11-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing diffusion-based pose-driven animation methods require spatially aligned referenceโ€“pose pairs with identical skeletal structures, limiting their applicability to misaligned or cross-structural inputs. To address this, we propose an alignment-agnostic, high-fidelity framework for character animation and image-based pose transfer. Our method introduces a self-supervised inpainting training paradigm with a unified masked input format, enabling arbitrary reference layouts. We further design identity-aware feature extraction and hybrid fusion attention to explicitly decouple appearance from skeletal structure. Additionally, identity-robust pose control and token replacement strategies enhance temporal coherence in long videos. The framework natively supports dynamic sequence lengths and multi-resolution inputs. Extensive experiments demonstrate significant improvements over state-of-the-art methods on cross-structural and layout-varying benchmarks, achieving high-quality, temporally consistent long-sequence animation generation.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advances in diffusion models have greatly improved pose-driven character animation. However, existing methods are limited to spatially aligned reference-pose pairs with matched skeletal structures. Handling reference-pose misalignment remains unsolved. To address this, we present One-to-All Animation, a unified framework for high-fidelity character animation and image pose transfer for references with arbitrary layouts. First, to handle spatially misaligned reference, we reformulate training as a self-supervised outpainting task that transforms diverse-layout reference into a unified occluded-input format. Second, to process partially visible reference, we design a reference extractor for comprehensive identity feature extraction. Further, we integrate hybrid reference fusion attention to handle varying resolutions and dynamic sequence lengths. Finally, from the perspective of generation quality, we introduce identity-robust pose control that decouples appearance from skeletal structure to mitigate pose overfitting, and a token replace strategy for coherent long-video generation. Extensive experiments show that our method outperforms existing approaches. The code and model will be available at https://github.com/ssj9596/One-to-All-Animation.
Problem

Research questions and friction points this paper is trying to address.

Addresses character animation with misaligned reference-pose pairs
Extracts identity features from partially visible character references
Decouples appearance from skeletal structure to prevent pose overfitting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised outpainting transforms diverse layouts into unified format
Hybrid reference fusion attention handles varying resolutions and sequence lengths
Identity-robust pose control decouples appearance from skeletal structure
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Shijun Shi
Jiangnan University
J
Jing Xu
University of Science and Technology of China
Zhihang Li
Zhihang Li
Kwai Inc
Computer VisionGenerative modelvideo/image generationLLM
C
Chunli Peng
Beijing University of Posts and Telecommunications
X
Xiaoda Yang
Zhejiang University
L
Lijing Lu
Chinese Academy of Sciences
K
Kai Hu
Jiangnan University
J
Jiangning Zhang
Zhejiang University