METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance

📅 2024-09-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address insufficient static and dynamic scene understanding and limited safety guarantees in multimodal end-to-end autonomous driving systems, this paper proposes a temporal-guided multimodal fusion framework. The method explicitly incorporates ego-vehicle state sequences—such as steering angle, throttle, and waypoints—as guiding inputs and introduces a novel temporal-guided loss function to jointly optimize geometric perception features and control signals along the time dimension. By integrating geometric feature extraction, multimodal end-to-end learning, and waypoint prediction, the framework achieves 70% driving score, 94% route completion rate, and 0.78 infraction score in CARLA simulation—significantly outperforming existing end-to-end baselines. Key contributions include: (1) the first explicit use of ego-state temporal signals as guidance for multimodal fusion; (2) a temporally aligned optimization objective that bridges perception and control; and (3) state-of-the-art performance demonstrating improved scene understanding and driving safety.

Technology Category

Application Category

📝 Abstract

Multi-modal end-to-end autonomous driving has shown promising advancements in recent work. By embedding more modalities into end-to-end networks, the system's understanding of both static and dynamic aspects of the driving environment is enhanced, thereby improving the safety of autonomous driving. In this paper, we introduce METDrive, an end-to-end system that leverages temporal guidance from the embedded time series features of ego states, including rotation angles, steering, throttle signals, and waypoint vectors. The geometric features derived from perception sensor data and the time series features of ego state data jointly guide the waypoint prediction with the proposed temporal guidance loss function. We evaluated METDrive on the CARLA leaderboard benchmarks, achieving a driving score of 70%, a route completion score of 94%, and an infraction score of 0.78.

Problem

Research questions and friction points this paper is trying to address.

Enhance autonomous driving safety using multi-modal end-to-end networks.

Integrate temporal guidance from ego state time series features.

Improve waypoint prediction with geometric and temporal data fusion.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages temporal guidance for autonomous driving

Integrates multi-modal data for enhanced environment understanding

Uses temporal guidance loss for waypoint prediction

🔎 Similar Papers

No similar papers found.