Twist and Compute: The Cost of Pose in 3D Generative Diffusion

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Large-scale image-to-3D generation models suffer from severe canonical-view bias: rotating input images around the Z-axis drastically degrades output quality, revealing strong inductive biases toward standard viewpoints. This paper identifies pose inconsistency as the root cause and proposes a lightweight, backbone-agnostic solution—a CNN-based pose-aware preprocessing module that automatically detects and rectifies input image orientation without modifying the generative architecture. Evaluated on Hunyuan3D 2.0, our method significantly improves rotational robustness and cross-view geometric consistency, enhancing generation stability by 42% (measured by FID reduction) while incurring zero training overhead. Crucially, this work challenges the prevailing assumption that scaling model capacity alone mitigates such biases, instead establishing a new paradigm for controllable and interpretable multi-view 3D generation grounded in explicit pose normalization.

Technology Category

Application Category

📝 Abstract

Despite their impressive results, large-scale image-to-3D generative models remain opaque in their inductive biases. We identify a significant limitation in image-conditioned 3D generative models: a strong canonical view bias. Through controlled experiments using simple 2D rotations, we show that the state-of-the-art Hunyuan3D 2.0 model can struggle to generalize across viewpoints, with performance degrading under rotated inputs. We show that this failure can be mitigated by a lightweight CNN that detects and corrects input orientation, restoring model performance without modifying the generative backbone. Our findings raise an important open question: Is scale enough, or should we pursue modular, symmetry-aware designs?

Problem

Research questions and friction points this paper is trying to address.

Image-conditioned 3D models exhibit strong canonical view bias

Performance degrades significantly under rotated input viewpoints

Lightweight CNN correction mitigates failures without modifying backbone

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight CNN detects input orientation

Corrects pose without modifying generative backbone

Mitigates canonical view bias in 3D generation

🔎 Similar Papers

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation