GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching

๐Ÿ“… 2025-03-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
High-quality stereoscopic image generation faces the fundamental challenge of simultaneously achieving visual realism and geometric consistencyโ€”a trade-off exacerbated by reliance on precise hardware calibration in existing methods. To address this, we propose the first unsupervised diffusion framework that jointly optimizes perceptual quality and disparity alignment accuracy. Our approach introduces disparity-aware coordinate embeddings as diffusion conditioning signals, an adaptive fusion mechanism integrating generated images with differentiable warped counterparts, and a multi-dataset joint training strategy supervised by an unsupervised stereo matching loss. Evaluated across 11 heterogeneous stereoscopic datasets, our method achieves state-of-the-art performance, reducing FID by 23.6% and end-point error (EPE) by 31.4% compared to prior works. The framework demonstrates strong practical applicability for XR rendering and autonomous driving systems, requiring no ground-truth disparity annotations or camera calibration.

Technology Category

Application Category

๐Ÿ“ Abstract
Stereo images are fundamental to numerous applications, including extended reality (XR) devices, autonomous driving, and robotics. Unfortunately, acquiring high-quality stereo images remains challenging due to the precise calibration requirements of dual-camera setups and the complexity of obtaining accurate, dense disparity maps. Existing stereo image generation methods typically focus on either visual quality for viewing or geometric accuracy for matching, but not both. We introduce GenStereo, a diffusion-based approach, to bridge this gap. The method includes two primary innovations (1) conditioning the diffusion process on a disparity-aware coordinate embedding and a warped input image, allowing for more precise stereo alignment than previous methods, and (2) an adaptive fusion mechanism that intelligently combines the diffusion-generated image with a warped image, improving both realism and disparity consistency. Through extensive training on 11 diverse stereo datasets, GenStereo demonstrates strong generalization ability. GenStereo achieves state-of-the-art performance in both stereo image generation and unsupervised stereo matching tasks. Our framework eliminates the need for complex hardware setups while enabling high-quality stereo image generation, making it valuable for both real-world applications and unsupervised learning scenarios. Project page is available at https://qjizhi.github.io/genstereo
Problem

Research questions and friction points this paper is trying to address.

Bridges visual quality and geometric accuracy in stereo image generation.
Eliminates need for complex hardware setups for high-quality stereo images.
Improves realism and disparity consistency in unsupervised stereo matching.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based stereo image generation method
Disparity-aware coordinate embedding for precise alignment
Adaptive fusion mechanism for realism and consistency
๐Ÿ”Ž Similar Papers
No similar papers found.