Twist and Compute: The Cost of Pose in 3D Generative Diffusion

πŸ“… 2025-11-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large-scale image-to-3D generation models suffer from severe canonical-view bias: rotating input images around the Z-axis drastically degrades output quality, revealing strong inductive biases toward standard viewpoints. This paper identifies pose inconsistency as the root cause and proposes a lightweight, backbone-agnostic solutionβ€”a CNN-based pose-aware preprocessing module that automatically detects and rectifies input image orientation without modifying the generative architecture. Evaluated on Hunyuan3D 2.0, our method significantly improves rotational robustness and cross-view geometric consistency, enhancing generation stability by 42% (measured by FID reduction) while incurring zero training overhead. Crucially, this work challenges the prevailing assumption that scaling model capacity alone mitigates such biases, instead establishing a new paradigm for controllable and interpretable multi-view 3D generation grounded in explicit pose normalization.

Technology Category

Application Category

πŸ“ Abstract
Despite their impressive results, large-scale image-to-3D generative models remain opaque in their inductive biases. We identify a significant limitation in image-conditioned 3D generative models: a strong canonical view bias. Through controlled experiments using simple 2D rotations, we show that the state-of-the-art Hunyuan3D 2.0 model can struggle to generalize across viewpoints, with performance degrading under rotated inputs. We show that this failure can be mitigated by a lightweight CNN that detects and corrects input orientation, restoring model performance without modifying the generative backbone. Our findings raise an important open question: Is scale enough, or should we pursue modular, symmetry-aware designs?
Problem

Research questions and friction points this paper is trying to address.

Image-conditioned 3D models exhibit strong canonical view bias
Performance degrades significantly under rotated input viewpoints
Lightweight CNN correction mitigates failures without modifying backbone
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight CNN detects input orientation
Corrects pose without modifying generative backbone
Mitigates canonical view bias in 3D generation
πŸ”Ž Similar Papers
No similar papers found.