SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address the failure of conventional frame-based stereo depth estimation in dynamic scenes, this paper proposes the first end-to-end method for dense depth estimation directly from asynchronous binocular event streams. Our approach follows a fusion-refinement paradigm: a recurrent spiking neural network (RSNN) is designed to jointly fuse spatiotemporal event information across views and iteratively refine depth predictions. Key contributions include: (1) the first end-to-end stereo matching framework explicitly designed for event data; (2) the first large-scale synthetic and real-world stereo event dataset with dense ground-truth depth annotations; and (3) significantly improved robustness in challenging regions—e.g., textureless surfaces and high-illumination conditions. Experiments demonstrate that our method consistently outperforms existing approaches on our benchmark dataset. Remarkably, it retains over 90% of its full-data accuracy using only 20% of the training data. The code and dataset will be made publicly available.

Technology Category

Application Category

📝 Abstract

Conventional frame-based cameras often struggle with stereo depth estimation in rapidly changing scenes. In contrast, bio-inspired spike cameras emit asynchronous events at microsecond-level resolution, providing an alternative sensing modality. However, existing methods lack specialized stereo algorithms and benchmarks tailored to the spike data. To address this gap, we propose SpikeStereoNet, a brain-inspired framework and the first to estimate stereo depth directly from raw spike streams. The model fuses raw spike streams from two viewpoints and iteratively refines depth estimation through a recurrent spiking neural network (RSNN) update module. To benchmark our approach, we introduce a large-scale synthetic spike stream dataset and a real-world stereo spike dataset with dense depth annotations. SpikeStereoNet outperforms existing methods on both datasets by leveraging spike streams' ability to capture subtle edges and intensity shifts in challenging regions such as textureless surfaces and extreme lighting conditions. Furthermore, our framework exhibits strong data efficiency, maintaining high accuracy even with substantially reduced training data. The source code and datasets will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Lack of specialized stereo algorithms for spike data

Absence of benchmarks for spike-based depth estimation

Difficulty in estimating depth in dynamic scenes with conventional cameras

Innovation

Methods, ideas, or system contributions that make the work stand out.

Brain-inspired framework for spike stream depth estimation

Recurrent spiking neural network refines depth iteratively

Large synthetic and real spike datasets with annotations

🔎 Similar Papers

Achieving more human brain-like vision via human EEG representational alignment