AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing painting process generation methods, which rely heavily on task-specific datasets and labor-intensive manual annotations, thereby hindering generalization to arbitrary input images. To overcome this, we propose a fully self-supervised framework that requires no ground-truth drawing sequences. Methodologically, we introduce a novel self-supervised video dataset construction pipeline leveraging depth estimation and differentiable stroke rendering. We design a dedicated fusion layer that explicitly models two fundamental human drawing behaviors—refinement and layering—and integrate it into a video diffusion architecture to enable temporally coherent, reverse stroke-generation. Experiments demonstrate that our approach produces high-fidelity, temporally consistent, human-like painting videos across diverse image categories, significantly outperforming supervised baselines. To our knowledge, this is the first method capable of universal painting process generation without any manually annotated drawing data.

Technology Category

Application Category

📝 Abstract
Humans can intuitively decompose an image into a sequence of strokes to create a painting, yet existing methods for generating drawing processes are limited to specific data types and often rely on expensive human-annotated datasets. We propose a novel self-supervised framework for generating drawing processes from any type of image, treating the task as a video generation problem. Our approach reverses the drawing process by progressively removing strokes from a reference image, simulating a human-like creation sequence. Crucially, our method does not require costly datasets of real human drawing processes; instead, we leverage depth estimation and stroke rendering to construct a self-supervised dataset. We model human drawings as"refinement"and"layering"processes and introduce depth fusion layers to enable video generation models to learn and replicate human drawing behavior. Extensive experiments validate the effectiveness of our approach, demonstrating its ability to generate realistic drawings without the need for real drawing process data.
Problem

Research questions and friction points this paper is trying to address.

Reconstructs painting process without human-annotated datasets
Generates drawing sequences from any image type
Simulates human-like stroke removal for self-supervised learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised framework for drawing process generation
Reverses drawing process via stroke removal simulation
Uses depth fusion layers to model human drawing behavior
🔎 Similar Papers
No similar papers found.
J
Junjie Hu
Fudan University, Shanghai, China
Shuyong Gao
Shuyong Gao
Fudan University
Human Visual AttentionGenerative ModelWeakly Supervised Learning
Qianyu Guo
Qianyu Guo
Shanghai Jiao Tong University School of Medicine
AI for BioscienceAI Drug DesignMachine Vision
Y
Yan Wang
Fudan University, Shanghai, China
Qishan Wang
Qishan Wang
Fudan Univiersity
Anomaly detection
Y
Yuang Feng
Fudan University, Shanghai, China
W
Wenqiang Zhang
Fudan University, Shanghai, China