Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

RGB-Event semantic segmentation faces triple misalignment challenges—temporal, spatial, and modal: event streams are asynchronous and sparse, whereas RGB frames are synchronous and dense; conventional cumulative event representations fail to capture temporal dependencies and exhibit modality mismatch with RGB. To address this, we propose Motion-Enhanced Event Tensors (MET), the first formulation that tightly couples dense optical flow with event timing features to generate dense, temporally consistent event representations. We further design a frequency-aware Bidirectional Flow Aggregation Module (BFAM) and a Time Fusion Module (TFM) to jointly achieve cross-modal spatiotemporal alignment. Our method achieves a +4.2% mIoU improvement over prior state-of-the-art on two mainstream RGB-Event benchmarks. The code is publicly available and has been widely adopted by the community.

Technology Category

Application Category

📝 Abstract

Event cameras capture motion dynamics, offering a unique modality with great potential in various computer vision tasks. However, RGB-Event fusion faces three intrinsic misalignments: (i) temporal, (ii) spatial, and (iii) modal misalignment. Existing voxel grid representations neglect temporal correlations between consecutive event windows, and their formulation with simple accumulation of asynchronous and sparse events is incompatible with the synchronous and dense nature of RGB modality. To tackle these challenges, we propose a novel event representation, Motion-enhanced Event Tensor (MET), which transforms sparse event voxels into a dense and temporally coherent form by leveraging dense optical flows and event temporal features. In addition, we introduce a Frequency-aware Bidirectional Flow Aggregation Module (BFAM) and a Temporal Fusion Module (TFM). BFAM leverages the frequency domain and MET to mitigate modal misalignment, while bidirectional flow aggregation and temporal fusion mechanisms resolve spatiotemporal misalignment. Experimental results on two large-scale datasets demonstrate that our framework significantly outperforms state-of-the-art RGB-Event semantic segmentation approaches. Our code is available at: https://github.com/zyaocoder/BRENet.

Problem

Research questions and friction points this paper is trying to address.

Addresses temporal, spatial, and modal misalignment in RGB-Event fusion

Improves event representation with Motion-enhanced Event Tensor (MET)

Introduces modules to mitigate spatiotemporal and modal misalignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion-enhanced Event Tensor for dense representation

Frequency-aware Bidirectional Flow Aggregation Module

Temporal Fusion Module for spatiotemporal alignment

🔎 Similar Papers

No similar papers found.

Bosch Group

Hildesheim, NDS, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)