LSS3D: Learnable Spatial Shifting for Consistent and High-Quality 3D Generation from Single-Image

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-image 3D generation methods suffer from cross-view geometric and texture misalignment, insufficient geometric detail, texture artifacts, and poor robustness to non-frontal inputs. To address these issues, we propose LSS3D, a novel framework that introduces a learnable spatial shifting mechanism to explicitly align features guided by reconstructed meshes, thereby resolving cross-view inconsistencies. Furthermore, it incorporates input-view constraints to enhance robustness under oblique viewpoints. The method integrates diffusion modeling, the learnable spatial shifting module, mesh reconstruction feedback, and explicit view-consistency regularization. Extensive experiments demonstrate that LSS3D achieves state-of-the-art performance in both geometric completeness and texture fidelity. Quantitatively, it significantly outperforms prior approaches across standard metrics—including Chamfer Distance, F-Score, and LPIPS—on benchmark datasets such as Objaverse and ShapeNet. Ablation studies validate the effectiveness of each component, particularly the spatial shifting module and view-consistency constraints, in improving structural coherence and visual realism.

Technology Category

Application Category

📝 Abstract
Recently, multi-view diffusion-based 3D generation methods have gained significant attention. However, these methods often suffer from shape and texture misalignment across generated multi-view images, leading to low-quality 3D generation results, such as incomplete geometric details and textural ghosting. Some methods are mainly optimized for the frontal perspective and exhibit poor robustness to oblique perspective inputs. In this paper, to tackle the above challenges, we propose a high-quality image-to-3D approach, named LSS3D, with learnable spatial shifting to explicitly and effectively handle the multiview inconsistencies and non-frontal input view. Specifically, we assign learnable spatial shifting parameters to each view, and adjust each view towards a spatially consistent target, guided by the reconstructed mesh, resulting in high-quality 3D generation with more complete geometric details and clean textures. Besides, we include the input view as an extra constraint for the optimization, further enhancing robustness to non-frontal input angles, especially for elevated viewpoint inputs. We also provide a comprehensive quantitative evaluation pipeline that can contribute to the community in performance comparisons. Extensive experiments demonstrate that our method consistently achieves leading results in both geometric and texture evaluation metrics across more flexible input viewpoints.
Problem

Research questions and friction points this paper is trying to address.

Addresses shape and texture misalignment in multi-view 3D generation
Improves robustness for non-frontal and elevated viewpoint inputs
Enhances geometric detail completeness and reduces textural ghosting artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable spatial shifting for 3D generation
Adjusts views toward spatially consistent target
Enhances robustness to non-frontal input angles
🔎 Similar Papers
No similar papers found.
Zhuojiang Cai
Zhuojiang Cai
Technical University of Munich
Human-Computer InteractionComputer Vision
Y
Yiheng Zhang
National University of Singapore, Singapore
M
Meitong Guo
Tsinghua University, Beijing, China
M
Mingdao Wang
Tsinghua University, Beijing, China
Yuwang Wang
Yuwang Wang
Tsinghua University, Beijing, China