LDRFusion: A LiDAR-Dominant multimodal refinement framework for 3D object detection

πŸ“… 2025-07-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the sparsity of LiDAR point clouds and noise introduced by pseudo-point clouds in multimodal fusion, this paper proposes a LiDAR-centric two-stage refined 3D detection framework. In the first stage, high-precision 3D proposals are generated solely from raw LiDAR data, thereby avoiding noise from vision-based or depth-completion-based pseudo-point cloud generation. In the second stage, depth-completion-derived pseudo-point clouds are selectively incorporated for hard examples; a hierarchical pseudo-point residual encoding module is introduced to explicitly model feature and positional residuals, enhancing local structural representation. Furthermore, an instance-level dual-stage result fusion mechanism is designed to achieve complementary modality advantages. Evaluated on the KITTI benchmark, the method achieves consistent and significant performance gains across all object classes and difficulty levels, demonstrating superior detection accuracy and robustness.

Technology Category

Application Category

πŸ“ Abstract
Existing LiDAR-Camera fusion methods have achieved strong results in 3D object detection. To address the sparsity of point clouds, previous approaches typically construct spatial pseudo point clouds via depth completion as auxiliary input and adopts a proposal-refinement framework to generate detection results. However, introducing pseudo points inevitably brings noise, potentially resulting in inaccurate predictions. Considering the differing roles and reliability levels of each modality, we propose LDRFusion, a novel Lidar-dominant two-stage refinement framework for multi-sensor fusion. The first stage soley relies on LiDAR to produce accurately localized proposals, followed by a second stage where pseudo point clouds are incorporated to detect challenging instances. The instance-level results from both stages are subsequently merged. To further enhance the representation of local structures in pseudo point clouds, we present a hierarchical pseudo point residual encoding module, which encodes neighborhood sets using both feature and positional residuals. Experiments on the KITTI dataset demonstrate that our framework consistently achieves strong performance across multiple categories and difficulty levels.
Problem

Research questions and friction points this paper is trying to address.

Improves 3D object detection by LiDAR-dominant multimodal fusion
Reduces noise from pseudo point clouds in refinement stages
Enhances local structure representation via hierarchical residual encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

LiDAR-dominant two-stage refinement framework
Hierarchical pseudo point residual encoding
Merge instance-level results from both stages
πŸ”Ž Similar Papers
No similar papers found.
J
Jijun Wang
School of Computer Science and Technology, Tongji University, Shanghai 201804, China
Y
Yan Wu
School of Computer Science and Technology, Tongji University, Shanghai 201804, China
Yujian Mo
Yujian Mo
Tongji University
εŽ‹εŠ›ε€§εˆ°η‘δΈη€οΌοΌοΌ
Junqiao Zhao
Junqiao Zhao
Department of Computer science and technology, Tongji University
SLAMLocalizationReinforcement LearningAutonomous Driving
J
Jun Yan
School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
Y
Yinghao Hu
School of Computer Science and Technology, Tongji University, Shanghai 201804, China