🤖 AI Summary
To address challenges in real-time category-level 3D mapping from single-view RGB inputs—including difficulty integrating category-level priors with object-level NeRFs, inaccurate canonical pose estimation, and low reconstruction efficiency—this paper proposes PRENOM. Methodologically, PRENOM introduces three key innovations: (1) a meta-learning-based category-level NeRF prior enabling cross-instance generalization; (2) a multi-objective genetic algorithm for automatic, category-specific NeRF architecture search; and (3) a prior-driven probabilistic ray sampling strategy to enhance rendering efficiency and geometric fidelity. Evaluated on low-power GPUs, PRENOM achieves a 21% reduction in Chamfer distance, a 13% improvement in overall metrics under realistic noise conditions, and a 5× speedup in training time—significantly outperforming DeepSDF and standard NeRF baselines.
📝 Abstract
In 3D object mapping, category-level priors enable efficient object reconstruction and canonical pose estimation, requiring only a single prior per semantic category (e.g., chair, book, laptop). Recently, DeepSDF has predominantly been used as a category-level shape prior, but it struggles to reconstruct sharp geometry and is computationally expensive. In contrast, NeRFs capture fine details but have yet to be effectively integrated with category-level priors in a real-time multi-object mapping framework. To bridge this gap, we introduce PRENOM, a Prior-based Efficient Neural Object Mapper that integrates category-level priors with object-level NeRFs to enhance reconstruction efficiency while enabling canonical object pose estimation. PRENOM gets to know objects on a first-name basis by meta-learning on synthetic reconstruction tasks generated from open-source shape datasets. To account for object category variations, it employs a multi-objective genetic algorithm to optimize the NeRF architecture for each category, balancing reconstruction quality and training time. Additionally, prior-based probabilistic ray sampling directs sampling toward expected object regions, accelerating convergence and improving reconstruction quality under constrained resources. Experimental results on a low-end GPU highlight the ability of PRENOM to achieve high-quality reconstructions while maintaining computational feasibility. Specifically, comparisons with prior-free NeRF-based approaches on a synthetic dataset show a 21% lower Chamfer distance, demonstrating better reconstruction quality. Furthermore, evaluations against other approaches using shape priors on a noisy real-world dataset indicate a 13% improvement averaged across all reconstruction metrics, and comparable pose and size estimation accuracy, while being trained for 5x less time.