Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weak reflectance discrimination and limited feature representation in LiDAR point clouds constrain 3D object detection performance. To address this, this paper introduces depth priors—generated by the vision foundation model DepthAnything—into LiDAR point feature enhancement for the first time. We propose a point-level depth prior fusion module that embeds predicted depth as an auxiliary attribute into raw point clouds. Furthermore, we design a voxel-point dual-path RoI feature extraction network with a bidirectional gated fusion mechanism to jointly model global semantics and local geometric structure. Evaluated on the KITTI dataset, our method achieves significant improvements in 3D detection accuracy, notably increasing the Car class AP₄₀ by 2.1%. This demonstrates the effectiveness and generalization potential of cross-modal geometric prior transfer for LiDAR-based perception.

Technology Category

Application Category

📝 Abstract
Recent advances in foundation models have opened up new possibilities for enhancing 3D perception. In particular, DepthAnything offers dense and reliable geometric priors from monocular RGB images, which can complement sparse LiDAR data in autonomous driving scenarios. However, such priors remain underutilized in LiDAR-based 3D object detection. In this paper, we address the limited expressiveness of raw LiDAR point features, especially the weak discriminative capability of the reflectance attribute, by introducing depth priors predicted by DepthAnything. These priors are fused with the original LiDAR attributes to enrich each point's representation. To leverage the enhanced point features, we propose a point-wise feature extraction module. Then, a Dual-Path RoI feature extraction framework is employed, comprising a voxel-based branch for global semantic context and a point-based branch for fine-grained structural details. To effectively integrate the complementary RoI features, we introduce a bidirectional gated RoI feature fusion module that balances global and local cues. Extensive experiments on the KITTI benchmark show that our method consistently improves detection accuracy, demonstrating the value of incorporating visual foundation model priors into LiDAR-based 3D object detection.
Problem

Research questions and friction points this paper is trying to address.

Enhancing sparse LiDAR data with DepthAnything priors
Improving discriminative capability of LiDAR reflectance attributes
Integrating global and local features for 3D object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses DepthAnything priors with LiDAR data
Uses Dual-Path RoI feature extraction framework
Introduces bidirectional gated RoI fusion module
🔎 Similar Papers
No similar papers found.
Yujian Mo
Yujian Mo
Tongji University
压力大到睡不着!!!
Y
Yan Wu
School of Computer Science and Technology, Tongji University, Shanghai 201804, China
Junqiao Zhao
Junqiao Zhao
Department of Computer science and technology, Tongji University
SLAMLocalizationReinforcement LearningAutonomous Driving
J
Jijun Wang
School of Computer Science and Technology, Tongji University, Shanghai 201804, China
Y
Yinghao Hu
School of Computer Science and Technology, Tongji University, Shanghai 201804, China
J
Jun Yan
School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China