π€ AI Summary
Existing radar-camera fusion depth estimation methods often sacrifice computational efficiency to improve accuracy, failing to meet the real-time and resource-constrained requirements of autonomous driving. This paper proposes a lightweight multimodal depth estimation model. Its core innovation is a novel triple knowledge distillation framework: (i) pixel-wise feature distillation, (ii) pairwise feature relation distillation, and (iii) uncertainty-weighted intermediate depth map distillation lossβjointly enhancing the representational capacity of the compact student network. The model fuses radar point clouds and image features within an efficient network architecture. Evaluated on the nuScenes dataset, it achieves a 6.6% reduction in mean absolute error (MAE) over a non-distilled lightweight baseline, significantly improving the accuracy-efficiency trade-off. The source code is publicly available.
π Abstract
Recently, radar-camera fusion algorithms have gained significant attention as radar sensors provide geometric information that complements the limitations of cameras. However, most existing radar-camera depth estimation algorithms focus solely on improving performance, often neglecting computational efficiency. To address this gap, we propose LiRCDepth, a lightweight radar-camera depth estimation model. We incorporate knowledge distillation to enhance the training process, transferring critical information from a complex teacher model to our lightweight student model in three key domains. Firstly, low-level and high-level features are transferred by incorporating pixel-wise and pair-wise distillation. Additionally, we introduce an uncertainty-aware inter-depth distillation loss to refine intermediate depth maps during decoding. Leveraging our proposed knowledge distillation scheme, the lightweight model achieves a 6.6% improvement in MAE on the nuScenes dataset compared to the model trained without distillation. Code: https://github.com/harborsarah/LiRCDepth