๐ค AI Summary
To address the scarcity of high-quality annotated nighttime semantic segmentation data for autonomous driving perception, this paper proposes a novel paradigm for generating photorealistic nighttime segmentation data from a single daytime imageโwithout requiring paired nighttime imagery. Methodologically, it integrates monocular depth estimation with semantic-guided mesh reconstruction to establish high-fidelity scene geometry, and combines material-aware physically based ray tracing with inverse rendering to model realistic light sources and perform low-light relighting. Its key contributions are: (i) the first end-to-end coupling of monocular geometric reconstruction, semantic-guided mesh generation, and physics-based nighttime relighting; and (ii) generation of highly generalizable synthetic data that significantly improves both supervised and unsupervised segmentation models on nighttime benchmarks. Human perceptual evaluation further validates the photorealism of the synthesized nighttime scenes.
๐ Abstract
Semantic segmentation is an important task for autonomous driving. A powerful autonomous driving system should be capable of handling images under all conditions, including nighttime. Generating accurate and diverse nighttime semantic segmentation datasets is crucial for enhancing the performance of computer vision algorithms in low-light conditions. In this thesis, we introduce a novel approach named NPSim, which enables the simulation of realistic nighttime images from real daytime counterparts with monocular inverse rendering and ray tracing. NPSim comprises two key components: mesh reconstruction and relighting. The mesh reconstruction component generates an accurate representation of the scene structure by combining geometric information extracted from the input RGB image and semantic information from its corresponding semantic labels. The relighting component integrates real-world nighttime light sources and material characteristics to simulate the complex interplay of light and object surfaces under low-light conditions. The scope of this thesis mainly focuses on the implementation and evaluation of the mesh reconstruction component. Through experiments, we demonstrate the effectiveness of the mesh reconstruction component in producing high-quality scene meshes and their generality across different autonomous driving datasets. We also propose a detailed experiment plan for evaluating the entire pipeline, including both quantitative metrics in training state-of-the-art supervised and unsupervised semantic segmentation approaches and human perceptual studies, aiming to indicate the capability of our approach to generate realistic nighttime images and the value of our dataset in steering future progress in the field.