🤖 AI Summary
Existing text-to-image diffusion models, trained on planar images, struggle to accommodate the circular topology and polar characteristics of 360° panoramas, often yielding seam artifacts and limited style transfer. This work proposes SHERPA, a lightweight adaptation framework that introduces several key innovations: a frequency-selective Circular RoPE that replaces high-frequency horizontal RoPE with integer-period harmonics while preserving low-frequency priors, toroidal latent encoding and decoding, image-side FFN adapters, and a dual-path training strategy combining a paired geometric path with an unpaired style path, augmented by a self-supervised yaw consistency constraint. Without requiring target images, SHERPA enables seamless, photorealistic, and open-domain text-guided generation of diverse artistic panoramic styles, effectively eliminating seam artifacts inherent in equirectangular projections.
📝 Abstract
Panoramic imagery is increasingly used in world-generation, games, and simulation, where users may need not only photorealistic scenes but also stylized and non-photorealistic environments. Large-scale text-to-image diffusion and flow models provide broad style and semantic priors for this goal, but planar image training misaligns them with the wrap-around topology and polar regions of $360^\circ$ panoramas represented in equirectangular projection (ERP). We present SHERPA, a lightweight adaptation framework that combines frequency-selective Circular RoPE, Circular Latent Encoding/Decoding, image-side FFN adapters, and a Dual-Path Training Scheme. Circular RoPE replaces only the seam-sensitive high-frequency horizontal RoPE band with integer-periodic harmonics while preserving the pretrained lower-frequency spectrum. The Paired Panorama Path supervises geometry, while the Unpaired Style Path uses self-supervised yaw consistency for target-free stylized prompts. As a result, SHERPA generates $360^\circ$ panoramas across both photorealistic panorama domains and open-domain stylized prompts.