🤖 AI Summary
Event vision suffers from a dual bottleneck: scarcity of real-world event data and high storage/I/O overhead of synthetic event data, hindering scalable model training and generalization. To address this, we propose the Video-to-Voxel (V2V) paradigm—a direct, end-to-end voxelization framework that bypasses conventional event stream generation entirely. Leveraging timestamp-aligned video frames, V2V constructs a parameterized dynamic voxel grid, enabling zero-event-stream voxelization for the first time. This yields 150× storage compression, supports real-time motion modeling and stochastic augmentation, and facilitates the largest event vision training set to date (52 hours). Using a lightweight encoder-decoder architecture, our method achieves state-of-the-art performance on video reconstruction and optical flow estimation. With training data exceeding existing benchmarks by an order of magnitude, we empirically demonstrate—uniquely—that large-scale pretraining substantially enhances generalization in event vision.
📝 Abstract
Event-based cameras offer unique advantages such as high temporal resolution, high dynamic range, and low power consumption. However, the massive storage requirements and I/O burdens of existing synthetic data generation pipelines and the scarcity of real data prevent event-based training datasets from scaling up, limiting the development and generalization capabilities of event vision models. To address this challenge, we introduce Video-to-Voxel (V2V), an approach that directly converts conventional video frames into event-based voxel grid representations, bypassing the storage-intensive event stream generation entirely. V2V enables a 150 times reduction in storage requirements while supporting on-the-fly parameter randomization for enhanced model robustness. Leveraging this efficiency, we train several video reconstruction and optical flow estimation model architectures on 10,000 diverse videos totaling 52 hours--an order of magnitude larger than existing event datasets, yielding substantial improvements.