🤖 AI Summary
To address semantic misalignment among pose, clothing, and identity attributes under cross-view or cross-motion conditions in portrait generation, this paper proposes a semantic-decoupled dual-stream conditional diffusion model. It explicitly embeds fine-grained attribute constraints into the denoising process for the first time: a pose-aware graph convolutional encoder captures structural priors, while a semantic attention module ensures attribute-layout alignment; an attribute-aware reweighting loss further enforces structural-semantic consistency. Evaluated on DeepFashion and Market-1501, the method reduces FID by 37% and improves human-rated semantic fidelity by 2.1× over prior work. It enables high-fidelity, fine-grained controllable portrait editing with precise attribute manipulation.