MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3d Reconstruction in Complex Scenes

๐Ÿ“… 2025-09-14
๐Ÿ›๏ธ International Conference on Information Photonics
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Single-view 3D reconstruction in complex real-world scenes is often hindered by noise, object diversity, and data scarcity, leading to insufficient geometric accuracy and detail recovery. To address these challenges, this work proposes the MGP-KAD framework, which first generates category-level geometric priors through clustering and dynamically fuses RGB images with multimodal geometric cues. Furthermore, it introduces a hybrid decoder based on Kolmogorovโ€“Arnold Networks (KANs), overcoming the representational limitations of conventional linear decoders when handling complex multimodal inputs. This approach marks the first application of KANs to 3D reconstruction and achieves state-of-the-art performance on Pix3D, significantly enhancing geometric completeness, surface smoothness, and fine-detail preservation.

Technology Category

Application Category

๐Ÿ“ Abstract
Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work provides a robust and effective solution for advancing single-view 3D reconstruction in complex scenes.
Problem

Research questions and friction points this paper is trying to address.

single-view 3D reconstruction
complex scenes
geometric prior
multimodal fusion
object diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal fusion
Geometric priors
Kolmogorov-Arnold Networks
Single-view 3D reconstruction
Dynamic feature adjustment
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Luoxi Zhang
Doctoral Program in Empowerment Informatics, University of Tsukuba, Japan
Chun Xie
Chun Xie
University of Tsukuba
Itaru Kitahara
Itaru Kitahara
University of Tsukuba
Computer VisionMixed RealityFree Viewpoint Video