StereoGen: High-quality Stereo Image Generation from a Single Image

📅 2025-01-15

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Supervised stereo matching methods suffer from poor generalization in real-world scenarios due to the scarcity of annotated stereo image pairs. Method: This paper proposes a novel single-image-to-stereo method that requires no ground-truth stereo pairs for training. It first estimates monocular depth to obtain pseudo-disparity, then jointly reconstructs occluded regions in the right view via pseudo-disparity-guided geometric warping and a diffusion-based inpainting model (with a fine-tuned inpainting module). Crucially, it introduces a training-free confidence generation mechanism and an adaptive disparity sampling strategy to robustly handle occlusions. Results: The method achieves state-of-the-art performance on zero-shot stereo matching. Synthesized stereo pairs exhibit rich texture detail, semantic consistency, and structural integrity, significantly enhancing generalization to real-world scenes without domain-specific fine-tuning.

Technology Category

Application Category

📝 Abstract

State-of-the-art supervised stereo matching methods have achieved amazing results on various benchmarks. However, these data-driven methods suffer from generalization to real-world scenarios due to the lack of real-world annotated data. In this paper, we propose StereoGen, a novel pipeline for high-quality stereo image generation. This pipeline utilizes arbitrary single images as left images and pseudo disparities generated by a monocular depth estimation model to synthesize high-quality corresponding right images. Unlike previous methods that fill the occluded area in warped right images using random backgrounds or using convolutions to take nearby pixels selectively, we fine-tune a diffusion inpainting model to recover the background. Images generated by our model possess better details and undamaged semantic structures. Besides, we propose Training-free Confidence Generation and Adaptive Disparity Selection. The former suppresses the negative effect of harmful pseudo ground truth during stereo training, while the latter helps generate a wider disparity distribution and better synthetic images. Experiments show that models trained under our pipeline achieve state-of-the-art zero-shot generalization results among all published methods. The code will be available upon publication of the paper.

Problem

Research questions and friction points this paper is trying to address.

Supervised Stereoscopic Image Matching

Real-world Data Scarcity

High-quality Stereoscopic Image Generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stereo Image Generation

Credibility Map

Disparity Enhancement

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View