🤖 AI Summary
Addressing the longstanding challenge of color inconsistency in automatic coloring of long-sequence animations, existing methods are constrained to short clips and neglect global temporal coherence. This paper proposes a Dynamic Global-Local Memory (DGLM) mechanism—the first to adaptively fuse compressed historical global features with current-frame local features—augmented by a color-consistency reinforcement reward and a dedicated color-fusion strategy during inference. Leveraging SketchDiT to extract structure-guided features, the DGLM module enables efficient dynamic modeling of historical context. Evaluated on both short-term (14-frame) and long-term (average 500-frame) animation sequences, our approach significantly outperforms state-of-the-art methods, markedly improving visual coherence and chromatic stability in open-domain animation coloring. The framework establishes a scalable, industrially viable paradigm for automatic coloring of long-form animations.
📝 Abstract
Animation colorization is a crucial part of real animation industry production. Long animation colorization has high labor costs. Therefore, automated long animation colorization based on the video generation model has significant research value. Existing studies are limited to short-term colorization. These studies adopt a local paradigm, fusing overlapping features to achieve smooth transitions between local segments. However, the local paradigm neglects global information, failing to maintain long-term color consistency. In this study, we argue that ideal long-term color consistency can be achieved through a dynamic global-local paradigm, i.e., dynamically extracting global color-consistent features relevant to the current generation. Specifically, we propose LongAnimation, a novel framework, which mainly includes a SketchDiT, a Dynamic Global-Local Memory (DGLM), and a Color Consistency Reward. The SketchDiT captures hybrid reference features to support the DGLM module. The DGLM module employs a long video understanding model to dynamically compress global historical features and adaptively fuse them with the current generation features. To refine the color consistency, we introduce a Color Consistency Reward. During inference, we propose a color consistency fusion to smooth the video segment transition. Extensive experiments on both short-term (14 frames) and long-term (average 500 frames) animations show the effectiveness of LongAnimation in maintaining short-term and long-term color consistency for open-domain animation colorization task. The code can be found at https://cn-makers.github.io/long_animation_web/.