PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment

πŸ“… 2025-08-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing single-view panoramic layout estimation methods struggle to leverage multi-view geometric constraints. To address this limitation, we propose an end-to-end trainable multi-view optimization framework. Our method jointly learns cubified room layout parameters and robust feature representations via dense depth-feature alignment, enabling large convergence basins and smooth loss landscapes while supporting simple initialization. Crucially, we integrate multi-view feature matching with depth-aware feature extraction, allowing natural generalization from single-room to multi-room scenes. We evaluate on ScanNet++ and 2D-3D-Semantics, as well as two newly constructed benchmarks. Our approach significantly outperforms state-of-the-art methods across all datasets, demonstrating superior geometric prior generation, robustness to viewpoint and occlusion variations, and strong cross-scene generalization capability.

Technology Category

Application Category

πŸ“ Abstract
Coarse room layout estimation provides important geometric cues for many downstream tasks. Current state-of-the-art methods are predominantly based on single views and often assume panoramic images. We introduce PixCuboid, an optimization-based approach for cuboid-shaped room layout estimation, which is based on multi-view alignment of dense deep features. By training with the optimization end-to-end, we learn feature maps that yield large convergence basins and smooth loss landscapes in the alignment. This allows us to initialize the room layout using simple heuristics. For the evaluation we propose two new benchmarks based on ScanNet++ and 2D-3D-Semantics, with manually verified ground truth 3D cuboids. In thorough experiments we validate our approach and significantly outperform the competition. Finally, while our network is trained with single cuboids, the flexibility of the optimization-based approach allow us to easily extend to multi-room estimation, e.g. larger apartments or offices. Code and model weights are available at https://github.com/ghanning/PixCuboid.
Problem

Research questions and friction points this paper is trying to address.

Estimates cuboid-shaped room layouts from multi-view images
Improves alignment using dense deep feature optimization
Extends single-room to multi-room layout estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view dense deep feature alignment
End-to-end optimization for smooth loss
Heuristic-based cuboid layout initialization
πŸ”Ž Similar Papers
No similar papers found.