🤖 AI Summary
To address the lack of editable layered representations in professional image synthesis, this paper proposes a novel “post-generation decomposition” paradigm: instead of training layered generative models, it leverages pre-trained diffusion models to synthesize full images and then applies a generative-prior-driven unsupervised disentanglement method to decompose them intelligently into foreground and background layers. Key contributions include: (1) integrating generative prior constraints into layer decomposition to enhance semantic consistency; and (2) introducing a high-frequency feature alignment module that significantly improves edge fidelity and fine-detail accuracy. The method robustly produces high-quality layered outputs across multi-scale and multi-content scenarios, enabling downstream applications such as relighting, occlusion repair, and local editing. Crucially, it achieves an effective balance between generation quality and controllability—without requiring annotated data or additional model training.
📝 Abstract
Layers have become indispensable tools for professional artists, allowing them to build a hierarchical structure that enables independent control over individual visual elements. In this paper, we propose LayeringDiff, a novel pipeline for the synthesis of layered images, which begins by generating a composite image using an off-the-shelf image generative model, followed by disassembling the image into its constituent foreground and background layers. By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training to develop generative capabilities for individual layers. Furthermore, by utilizing a pretrained off-the-shelf generative model, our method can produce diverse contents and object scales in synthesized layers. For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers. We also propose high-frequency alignment modules to refine the fine-details of the estimated layers. Our comprehensive experiments demonstrate that our approach effectively synthesizes layered images and supports various practical applications.