Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

158K/year
🤖 AI Summary
Existing layout-to-image generation methods suffer from fragmented representations under few-shot, atypical scenarios, leading to image distortion and loss of detail. This work proposes a representation-driven framework that explicitly decouples semantic identity from visual primitives for the first time. Semantic anchoring aggregates category-level semantics to stabilize object identity, while primitive injection models recombinable local primitives to enhance fine-grained details. Furthermore, a concept-guided mechanism incorporating saliency-aware optimization is introduced to improve foreground semantic consistency. Evaluated under a strict 5-shot setting, the proposed method consistently outperforms state-of-the-art approaches across multiple atypical domains, achieving notable improvements in both visual fidelity and layout alignment.
📝 Abstract
The layout-to-image (L2I) task enables fine-grained control over image generation via object categories and spatial layouts. However, existing L2I methods yield fragmented and distorted generations under few-shot atypical settings. We term this failure as representation fragmentation, arising from a granularity mismatch that entangles semantic identity with visual details. To address this issue, we propose a representation-driven framework that disentangles semantics from primitives for robust few-shot adaptation. Specifically, Semantic Anchoring aggregates categorical semantics into anchors for stable identity, while Primitive Imbuing models recomposable primitives for robust local detail modeling. Conceptual Steering further regulates optimization with a saliency-aware objective to preserve foreground semantic consistency. Extensive experiments demonstrate consistent improvements in the 5-shot regime over state-of-the-art L2I methods in both visual fidelity and alignment across diverse atypical domains. The source code is publicly available at https://github.com/iCVTEAM/DSP.
Problem

Research questions and friction points this paper is trying to address.

layout-to-image
few-shot
atypical
representation fragmentation
semantic disentanglement
Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled representation
few-shot layout-to-image generation
semantic anchoring
primitive imbuing
conceptual steering
N
Nan Bao
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering and Qingdao Research Institute, Beihang University, China
Yifan Zhao
Yifan Zhao
School of Computer Science and Engineering, Beihang University
Computer VisionComputer GraphicsVR/AR
W
Wenzhuang Wang
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering and Qingdao Research Institute, Beihang University, China
J
Jia Li
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering and Qingdao Research Institute, Beihang University, China