Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the geometric and appearance inconsistency across views in single-image zero-shot novel view synthesis. We propose a training-free solution that requires neither model retraining nor fine-tuning. Methodologically, we introduce parameter-free 3D epipolar geometry constraints into diffusion models for the first time, designing a cross-view epipolar attention mechanism that explicitly enforces geometry-appearance consistency between the reference view and multiple target views during generation. The mechanism is fully training-free, supports single-image input, and enables cooperative multi-view synthesis. Experiments demonstrate state-of-the-art quantitative performance in cross-view consistency and significant improvements in downstream 3D reconstruction tasks. Our code is publicly available.

Technology Category

Application Category

📝 Abstract

Large diffusion models demonstrate remarkable zero-shot capabilities in novel view synthesis from a single image. However, these models often face challenges in maintaining consistency across novel and reference views. A crucial factor leading to this issue is the limited utilization of contextual information from reference views. Specifically, when there is an overlap in the viewing frustum between two views, it is essential to ensure that the corresponding regions maintain consistency in both geometry and appearance. This observation leads to a simple yet effective approach, where we propose to use epipolar geometry to locate and retrieve overlapping information from the input view. This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning, as the process requires no learnable parameters. Furthermore, to enhance the overall consistency of generated views, we extend the utilization of epipolar attention to a multi-view setting, allowing retrieval of overlapping information from the input view and other target views. Qualitative and quantitative experimental results demonstrate the effectiveness of our method in significantly improving the consistency of synthesized views without the need for any fine-tuning. Moreover, This enhancement also boosts the performance of downstream applications such as 3D reconstruction. The code is available at https://github.com/botaoye/ConsisSyn.

Problem

Research questions and friction points this paper is trying to address.

Maintain consistency in novel view synthesis

Utilize epipolar geometry for contextual information

Enhance multi-view consistency without re-training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes epipolar geometry for view synthesis

Employs multi-view epipolar attention mechanism

No training or fine-tuning required

🔎 Similar Papers

No similar papers found.