Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited object-centric understanding and compositional generation capability of diffusion models in text-free settings. We propose SlotAdapt, a lightweight, plug-and-play adapter that integrates slot-based attention into pretrained diffusion models (e.g., Stable Diffusion), establishing the first slot-guided adaptation mechanism to effectively decouple the model’s inherent text bias. To ensure spatial consistency between object perception and generation without manual annotations, we introduce a self-supervised cross-attention alignment loss. Extensive experiments on multiple benchmark datasets demonstrate that SlotAdapt significantly outperforms existing state-of-the-art methods. Notably, it achieves superior performance in prompt-free object discovery and compositional generation on complex, real-world images—enabling robust, interpretable, and controllable generation without textual guidance.

Technology Category

Application Category

📝 Abstract
We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion models, while avoiding their text-centric conditioning bias. We also incorporate an additional guidance loss into our architecture to align cross-attention from adapter layers with slot attention. This enhances the alignment of our model with the objects in the input image without using external supervision. Experimental results show that our method outperforms state-of-the-art techniques in object discovery and image generation tasks across multiple datasets, including those with real images. Furthermore, we demonstrate through experiments that our method performs remarkably well on complex real-world images for compositional generation, in contrast to other slot-based generative methods in the literature. The project page can be found at $href{https://kaanakan.github.io/SlotAdapt/}{ ext{this https url}}$.
Problem

Research questions and friction points this paper is trying to address.

Image Understanding
Object Recognition
Image Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

SlotAdapt
Object Understanding
Complex Image Synthesis
🔎 Similar Papers
No similar papers found.
Adil Kaan Akan
Adil Kaan Akan
Koc University
Computer VisionDeep Learning
Y
Y. Yemez
Koc University, KUIS AI Center