Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing layout-to-image generation methods suffer from fragmented representations under few-shot, atypical scenarios, leading to image distortion and loss of detail. This work proposes a representation-driven framework that explicitly decouples semantic identity from visual primitives for the first time. Semantic anchoring aggregates category-level semantics to stabilize object identity, while primitive injection models recombinable local primitives to enhance fine-grained details. Furthermore, a concept-guided mechanism incorporating saliency-aware optimization is introduced to improve foreground semantic consistency. Evaluated under a strict 5-shot setting, the proposed method consistently outperforms state-of-the-art approaches across multiple atypical domains, achieving notable improvements in both visual fidelity and layout alignment.

📝 Abstract

The layout-to-image (L2I) task enables fine-grained control over image generation via object categories and spatial layouts. However, existing L2I methods yield fragmented and distorted generations under few-shot atypical settings. We term this failure as representation fragmentation, arising from a granularity mismatch that entangles semantic identity with visual details. To address this issue, we propose a representation-driven framework that disentangles semantics from primitives for robust few-shot adaptation. Specifically, Semantic Anchoring aggregates categorical semantics into anchors for stable identity, while Primitive Imbuing models recomposable primitives for robust local detail modeling. Conceptual Steering further regulates optimization with a saliency-aware objective to preserve foreground semantic consistency. Extensive experiments demonstrate consistent improvements in the 5-shot regime over state-of-the-art L2I methods in both visual fidelity and alignment across diverse atypical domains. The source code is publicly available at https://github.com/iCVTEAM/DSP.

Problem

Research questions and friction points this paper is trying to address.

layout-to-image

few-shot

atypical

representation fragmentation

semantic disentanglement

Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled representation

few-shot layout-to-image generation

semantic anchoring