New Spiking Architecture for Multi-Modal Decision-Making in Autonomous Vehicles

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and poor deployability of Transformer-based multimodal models on edge devices for high-level autonomous driving decision-making, this paper proposes an end-to-end multimodal reinforcement learning framework tailored for real-time decision-making. The method introduces a Transformer-like architecture built upon ternary spiking neurons, enabling efficient fusion of heterogeneous inputs—including camera images, LiDAR point clouds, and vehicle pose data. It further incorporates spike-timing-aware mechanisms and a cross-attention module to preserve multimodal representation fidelity while drastically reducing computational complexity. Experimental evaluation on the Highway Environment benchmark demonstrates that the proposed approach achieves comparable or superior decision accuracy across multiple tasks, with a 42% reduction in inference latency and a 58% decrease in power consumption—thereby satisfying stringent real-time and energy-efficiency constraints of in-vehicle edge platforms.

Technology Category

Application Category

📝 Abstract
This work proposes an end-to-end multi-modal reinforcement learning framework for high-level decision-making in autonomous vehicles. The framework integrates heterogeneous sensory input, including camera images, LiDAR point clouds, and vehicle heading information, through a cross-attention transformer-based perception module. Although transformers have become the backbone of modern multi-modal architectures, their high computational cost limits their deployment in resource-constrained edge environments. To overcome this challenge, we propose a spiking temporal-aware transformer-like architecture that uses ternary spiking neurons for computationally efficient multi-modal fusion. Comprehensive evaluations across multiple tasks in the Highway Environment demonstrate the effectiveness and efficiency of the proposed approach for real-time autonomous decision-making.
Problem

Research questions and friction points this paper is trying to address.

Develops a spiking transformer for efficient multi-modal fusion in autonomous vehicles
Integrates camera, LiDAR, and heading data for real-time decision-making
Reduces computational cost for deployment in resource-constrained edge environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spiking transformer-like architecture for multi-modal fusion
Ternary spiking neurons enable computational efficiency
Cross-attention transformer integrates camera, LiDAR, heading data
Aref Ghoreishee
Aref Ghoreishee
Drexel University
Control TheorySystem EngineeringNeuromorphic ComputingReinforcement Learning
A
Abhishek Mishra
Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA 19104
Lifeng Zhou
Lifeng Zhou
Assistant Professor, Drexel University
Robotics
J
John Walsh
Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA 19104
Nagarajan Kandasamy
Nagarajan Kandasamy
Professor of computer engineering, Drexel University
Computer architecture