๐ค AI Summary
Single-view 3D reconstruction in complex real-world scenes is often hindered by noise, object diversity, and data scarcity, leading to insufficient geometric accuracy and detail recovery. To address these challenges, this work proposes the MGP-KAD framework, which first generates category-level geometric priors through clustering and dynamically fuses RGB images with multimodal geometric cues. Furthermore, it introduces a hybrid decoder based on KolmogorovโArnold Networks (KANs), overcoming the representational limitations of conventional linear decoders when handling complex multimodal inputs. This approach marks the first application of KANs to 3D reconstruction and achieves state-of-the-art performance on Pix3D, significantly enhancing geometric completeness, surface smoothness, and fine-detail preservation.
๐ Abstract
Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work provides a robust and effective solution for advancing single-view 3D reconstruction in complex scenes.