🤖 AI Summary
To address the challenge of robust visual localization for event cameras under high-speed motion and extreme lighting conditions—where prior LiDAR maps often fail—we propose a tightly coupled event–LiDAR framework for six-degree-of-freedom pose estimation. Our method integrates LiDAR point cloud projection, optical flow alignment, PnP-based pose solving, and joint optimization of event representation within an end-to-end trainable architecture. Key contributions include: (1) a novel frame-based event representation that enhances structural clarity and spatiotemporal consistency; (2) an auxiliary-variable regularization module with bias correction to mitigate the adverse impact of ground-truth pose noise on self-supervised training; and (3) a unified optimization pipeline that jointly refines event representations and pose estimates. Extensive experiments on multiple public benchmarks demonstrate significant improvements over state-of-the-art methods. The source code and pre-trained models are publicly available.
📝 Abstract
Event cameras are bio-inspired sensors with some notable features, including high dynamic range and low latency, which makes them exceptionally suitable for perception in challenging scenarios such as high-speed motion and extreme lighting conditions. In this paper, we explore their potential for localization within pre-existing LiDAR maps, a critical task for applications that require precise navigation and mobile manipulation. Our framework follows a paradigm based on the refinement of an initial pose. Specifically, we first project LiDAR points into 2D space based on a rough initial pose to obtain depth maps, and then employ an optical flow estimation network to align events with LiDAR points in 2D space, followed by camera pose estimation using a PnP solver. To enhance geometric consistency between these two inherently different modalities, we develop a novel frame-based event representation that improves structural clarity. Additionally, given the varying degrees of bias observed in the ground truth poses, we design a module that predicts an auxiliary variable as a regularization term to mitigate the impact of this bias on network convergence. Experimental results on several public datasets demonstrate the effectiveness of our proposed method. To facilitate future research, both the code and the pre-trained models are made available online.