🤖 AI Summary
This work addresses controllable layout generation under incomplete conditions (e.g., partial component types or positions), where conventional methods suffer from poor layout plausibility. We propose a retrieval-augmented generation framework: first, a cross-layout retrieval module identifies semantically similar layout templates; second, a conditional modulation attention mechanism dynamically integrates implicit layout priors from retrieved templates; finally, reference-guided denoising and structural transfer are realized within diffusion or flow-matching frameworks. Crucially, our approach explicitly couples condition-driven retrieval with the generative process—enabling high-fidelity, constraint-satisfying, structurally sound, and diverse layouts without relying on content information. To our knowledge, this is the first method to achieve such explicit coupling. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple benchmarks, validating both effectiveness and generalization capability.
📝 Abstract
Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design according to certain optional constraints, such as the type or position of a specific component. While recent diffusion or flow-matching models have achieved considerable advances in multifarious conditional generation tasks, there remains considerable room for generating optimal arrangements under given conditions. In this work, we propose to carry out layout generation through retrieving by conditions and reference-guided generation. Specifically, we retrieve appropriate layout templates according to given conditions as references. The references are then utilized to guide the denoising or flow-based transport process. By retrieving layouts compatible with the given conditions, we can uncover the potential information not explicitly provided in the given condition. Such an approach offers more effective guidance to the model during the generation process, in contrast to previous models that feed the condition to the model and let the model infer the unprovided layout attributes directly. Meanwhile, we design a condition-modulated attention that selectively absorbs retrieval knowledge, adapting to the difference between retrieved templates and given conditions. Extensive experiment results show that our method successfully produces high-quality layouts that meet the given conditions and outperforms existing state-of-the-art models. Code will be released upon acceptance.