CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation

πŸ“… 2025-08-31
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Fine-grained, multi-attribute (e.g., age, expression) co-editing in text-to-image (T2I) generation remains challenging due to attribute entanglement; existing slider-based methods suffer from cross-attribute interference and lack independent, precise control. Method: We propose CompSliderβ€”a framework that enables decoupled attribute editing without fine-tuning the base diffusion model. It introduces a latent-space conditional prior modeling scheme coupled with a slider-adaptation mechanism, jointly optimized via a disentanglement loss and a structure-consistency loss to achieve attribute disentanglement while preserving geometric integrity. Contribution/Results: CompSlider is compatible with both T2I and text-to-video (T2V) generation, supporting high-fidelity, efficient multi-attribute composite editing. Extensive experiments validate its effectiveness across diverse attributes and demonstrate successful generalization to video generation, establishing new state-of-the-art performance in controllable generative editing.

Technology Category

Application Category

πŸ“ Abstract
In text-to-image (T2I) generation, achieving fine-grained control over attributes - such as age or smile - remains challenging, even with detailed text prompts. Slider-based methods offer a solution for precise control of image attributes. Existing approaches typically train individual adapter for each attribute separately, overlooking the entanglement among multiple attributes. As a result, interference occurs among different attributes, preventing precise control of multiple attributes together. To address this challenge, we aim to disentangle multiple attributes in slider-based generation to enbale more reliable and independent attribute manipulation. Our approach, CompSlider, can generate a conditional prior for the T2I foundation model to control multiple attributes simultaneously. Furthermore, we introduce novel disentanglement and structure losses to compose multiple attribute changes while maintaining structural consistency within the image. Since CompSlider operates in the latent space of the conditional prior and does not require retraining the foundation model, it reduces the computational burden for both training and inference. We evaluate our approach on a variety of image attributes and highlight its generality by extending to video generation.
Problem

Research questions and friction points this paper is trying to address.

Disentangle multiple attributes in slider-based image generation
Enable reliable independent manipulation of image attributes
Reduce computational burden for training and inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates conditional prior for multi-attribute control
Introduces disentanglement and structure loss functions
Operates in latent space without retraining foundation model
πŸ”Ž Similar Papers
No similar papers found.