🤖 AI Summary
This work addresses human pose transfer—generating high-fidelity, cross-person images of a target pose given a source image and target pose. Key challenges include keypoint misalignment and texture distortion arising from inter-person variations in body shape, scale, and occlusion. To tackle these, we propose a multi-scale attention-guided pose disentanglement framework: (i) we introduce cross-scale channel attention into both pose representation and reconstruction; (ii) we design a spatial-channel joint attention module and a deformable keypoint feature alignment layer to enhance local detail fidelity and global structural consistency; and (iii) we employ a U-Net generator jointly optimized with a multi-scale adversarial discriminator. Evaluated on DeepFashion and Market-1501, our method achieves 92.3% KP-SSIM and 86.7 FID—substantially outperforming state-of-the-art approaches including Prior-SPADE and PATN.