🤖 AI Summary
Existing instruction-driven image editing models lack continuous, fine-grained control over the strength of individual instructions in multi-instruction editing, resulting in insufficient user controllability. To address this, we propose a decoupled instruction editing framework that, for the first time, enables composable and continuously adjustable fine-grained control in instruction-driven editing. Our method introduces a globally shared slider-parameterization mechanism to independently modulate the strength of each instruction, while employing only a single set of low-rank adaptation (LoRA) matrices—generalizable across diverse editing types and attributes without attribute-specific training. We validate our approach on state-of-the-art models including FLUX-Kontext and Qwen-Image-Edit. Experiments demonstrate significant improvements in controllability, spatial locality, semantic consistency, and user guidance capability. The framework achieves both broad applicability and computational efficiency, establishing a new paradigm for flexible, user-centric instruction-driven image editing.
📝 Abstract
Instruction-based image editing models have recently achieved impressive performance, enabling complex edits to an input image from a multi-instruction prompt. However, these models apply each instruction in the prompt with a fixed strength, limiting the user's ability to precisely and continuously control the intensity of individual edits. We introduce SliderEdit, a framework for continuous image editing with fine-grained, interpretable instruction control. Given a multi-part edit instruction, SliderEdit disentangles the individual instructions and exposes each as a globally trained slider, allowing smooth adjustment of its strength. Unlike prior works that introduced slider-based attribute controls in text-to-image generation, typically requiring separate training or fine-tuning for each attribute or concept, our method learns a single set of low-rank adaptation matrices that generalize across diverse edits, attributes, and compositional instructions. This enables continuous interpolation along individual edit dimensions while preserving both spatial locality and global semantic consistency. We apply SliderEdit to state-of-the-art image editing models, including FLUX-Kontext and Qwen-Image-Edit, and observe substantial improvements in edit controllability, visual consistency, and user steerability. To the best of our knowledge, we are the first to explore and propose a framework for continuous, fine-grained instruction control in instruction-based image editing models. Our results pave the way for interactive, instruction-driven image manipulation with continuous and compositional control.