🤖 AI Summary
This work addresses the challenge of separately modeling perspective and panoramic images for indoor layout geometry estimation. To this end, we propose the first end-to-end unified framework. Our method projects both image types into a common equirectangular space and introduces a modulated shared CNN backbone, complemented by latitude-adaptive feature allocation and 1D convolutional domain conditioning—enabling field-of-view-invariant feature extraction and column-wise layout regression. Evaluated on real-world benchmarks including LSUN and Matterport3D, our approach achieves state-of-the-art performance in geometric layout estimation while significantly improving cross-modal generalization. The source code is publicly available.
📝 Abstract
We present uLayout, a unified model for estimating room layout geometries from both perspective and panoramic images, whereas traditional solutions require different model designs for each image type. The key idea of our solution is to unify both domains into the equirectangular projection, particularly, allocating perspective images into the most suitable latitude coordinate to effectively exploit both domains seamlessly. To address the Field-of-View (FoV) difference between the input domains, we design uLayout with a shared feature extractor with an extra 1D-Convolution layer to condition each domain input differently. This conditioning allows us to efficiently formulate a column-wise feature regression problem regardless of the FoV input. This simple yet effective approach achieves competitive performance with current state-of-the-art solutions and shows for the first time a single end-to-end model for both domains. Extensive experiments in the real-world datasets, LSUN, Matterport3D, PanoContext, and Stanford 2D-3D evidence the contribution of our approach. Code is available at https://github.com/JonathanLee112/uLayout.