METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance

📅 2024-09-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient static and dynamic scene understanding and limited safety guarantees in multimodal end-to-end autonomous driving systems, this paper proposes a temporal-guided multimodal fusion framework. The method explicitly incorporates ego-vehicle state sequences—such as steering angle, throttle, and waypoints—as guiding inputs and introduces a novel temporal-guided loss function to jointly optimize geometric perception features and control signals along the time dimension. By integrating geometric feature extraction, multimodal end-to-end learning, and waypoint prediction, the framework achieves 70% driving score, 94% route completion rate, and 0.78 infraction score in CARLA simulation—significantly outperforming existing end-to-end baselines. Key contributions include: (1) the first explicit use of ego-state temporal signals as guidance for multimodal fusion; (2) a temporally aligned optimization objective that bridges perception and control; and (3) state-of-the-art performance demonstrating improved scene understanding and driving safety.

Technology Category

Application Category

📝 Abstract
Multi-modal end-to-end autonomous driving has shown promising advancements in recent work. By embedding more modalities into end-to-end networks, the system's understanding of both static and dynamic aspects of the driving environment is enhanced, thereby improving the safety of autonomous driving. In this paper, we introduce METDrive, an end-to-end system that leverages temporal guidance from the embedded time series features of ego states, including rotation angles, steering, throttle signals, and waypoint vectors. The geometric features derived from perception sensor data and the time series features of ego state data jointly guide the waypoint prediction with the proposed temporal guidance loss function. We evaluated METDrive on the CARLA leaderboard benchmarks, achieving a driving score of 70%, a route completion score of 94%, and an infraction score of 0.78.
Problem

Research questions and friction points this paper is trying to address.

Enhance autonomous driving safety using multi-modal end-to-end networks.
Integrate temporal guidance from ego state time series features.
Improve waypoint prediction with geometric and temporal data fusion.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages temporal guidance for autonomous driving
Integrates multi-modal data for enhanced environment understanding
Uses temporal guidance loss for waypoint prediction
🔎 Similar Papers
No similar papers found.
Z
Ziang Guo
Intelligent Space Robotics Laboratory, Center for Digital Engineering, Skolkovo Institute of Science and Technology, Moscow, Russia
X
Xinhao Lin
Institute of Automation, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250399, P.R.China
Zakhar Yagudin
Zakhar Yagudin
Student, Skoltech
Self-drivingAutonomous carsComputer VisionControl Theory
Artem Lykov
Artem Lykov
PhD student, Skolkovo Institute of Science and Technology
RoboticsAICognitive roboticsVLA
Y
Yong Wang
Institute of Automation, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250399, P.R.China
Y
Yanqiang Li
Institute of Automation, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250399, P.R.China
Dzmitry Tsetserukou
Dzmitry Tsetserukou
Associate Professor, Skolkovo Institute of Science and Technology (Skoltech)
RoboticsHapticsUAV SwarmAIVR