🤖 AI Summary
Existing volumetric indoor reconstruction methods heavily rely on multi-view geometric constraints, leading to severe performance degradation under sparse-view conditions and occluded regions. To address this, we propose an image-plane geometric decoding framework that bypasses inter-view ray intersection and instead extracts spatial structure information from single views. Our key contributions are: (1) pixel-wise confidence encoding (PCE), which quantifies observation reliability; (2) an affine compensation module (ACM) to correct imaging distortions; and (3) an image-plane spatial decoder (IPSD) that incorporates physical imaging priors for high-fidelity 2D-to-3D mapping. Evaluated on ScanNetV2, our method maintains nearly identical reconstruction quality when the number of input views is reduced by 40%, with a coefficient of variation of only 0.24% and 99.7% performance retention. This significantly enhances view invariance and robustness to occlusions and boundary regions.
📝 Abstract
Volume-based indoor scene reconstruction methods demonstrate significant research value due to their superior generalization capability and real-time deployment potential. However, existing methods rely on multi-view pixel back-projection ray intersections as weak geometric constraints to determine spatial positions, causing reconstruction quality to depend heavily on input view density with poor performance in overlapping regions and unobserved areas. To address these issues, the key lies in reducing dependency on inter-view geometric constraints while exploiting rich spatial information within individual views. We propose IPDRecon, an image-plane decoding framework comprising three core components: Pixel-level Confidence Encoder (PCE), Affine Compensation Module (ACM), and Image-Plane Spatial Decoder (IPSD). These modules collaboratively decode 3D structural information encoded in 2D images through physical imaging processes, effectively preserving spatial geometric features including edges, hollow structures, and complex textures while significantly enhancing view-invariant reconstruction. Experiments on ScanNetV2 confirm that IPDRecon achieves superior reconstruction stability, maintaining nearly identical quality when view count reduces by 40%. The method achieves a coefficient of variation of only 0.24%, performance retention rate of 99.7%, and maximum performance drop of merely 0.42%. This demonstrates that exploiting intra-view spatial information provides a robust solution for view-limited scenarios in practical applications.