DELTA: Dense Depth from Events and LiDAR using Transformer's Attention

πŸ“… 2025-05-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the challenge of fusing heterogeneous event-camera and LiDAR data to generate dense depth maps. We propose the first Transformer-based deep fusion framework for this task. Methodologically, we design a dual-stream encoder comprising an event voxel encoder and a LiDAR sparse point cloud projector, augmented with intra-modal self-attention to model spatiotemporal dependencies within each modality and cross-modal cross-attention to achieve fine-grained spatiotemporal alignment and complementary information integration. Our key contribution is the first systematic integration of Transformer architectures into event–LiDAR depth estimation, overcoming limitations of conventional CNN-based or hand-crafted alignment approaches. Evaluated on standard event-based depth estimation benchmarks, our method establishes new state-of-the-art performance: it reduces absolute depth error in the near range (<5 m) by up to 4Γ— compared to prior best methods, significantly improving both accuracy and robustness of dense depth reconstruction.

Technology Category

Application Category

πŸ“ Abstract
Event cameras and LiDARs provide complementary yet distinct data: respectively, asynchronous detections of changes in lighting versus sparse but accurate depth information at a fixed rate. To this day, few works have explored the combination of these two modalities. In this article, we propose a novel neural-network-based method for fusing event and LiDAR data in order to estimate dense depth maps. Our architecture, DELTA, exploits the concepts of self- and cross-attention to model the spatial and temporal relations within and between the event and LiDAR data. Following a thorough evaluation, we demonstrate that DELTA sets a new state of the art in the event-based depth estimation problem, and that it is able to reduce the errors up to four times for close ranges compared to the previous SOTA.
Problem

Research questions and friction points this paper is trying to address.

Fusing event and LiDAR data for dense depth estimation
Modeling spatial-temporal relations using self- and cross-attention
Improving depth accuracy for close-range scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses event and LiDAR data using transformers
Employs self- and cross-attention mechanisms
Achieves dense depth estimation with high accuracy
πŸ”Ž Similar Papers
No similar papers found.