🤖 AI Summary
This work addresses the performance degradation of existing BEV-based 3D object detection methods—primarily designed for pinhole cameras—when applied to fisheye cameras due to severe radial distortion, and the absence of benchmarks or effective solutions for hybrid camera setups. To bridge this gap, we present the first real-world BEV 3D detection benchmark that integrates both pinhole and fisheye cameras, constructed by converting KITTI-360 data into the nuScenes format. We systematically evaluate strategies including image rectification, MEI camera model–guided distortion-aware view transformation, and polar coordinate representations. Our experiments demonstrate that projection-free architectures significantly outperform conventional view transformation approaches under fisheye distortion, highlighting their superior robustness and offering practical design guidelines for cost-effective, robust 3D perception systems in autonomous driving.
📝 Abstract
Modern autonomous driving systems increasingly rely on mixed camera configurations with pinhole and fisheye cameras for full view perception. However, Bird's-Eye View (BEV) 3D object detection models are predominantly designed for pinhole cameras, leading to performance degradation under fisheye distortion. To bridge this gap, we introduce a multi-view BEV detection benchmark with mixed cameras by converting KITTI-360 into nuScenes format. Our study encompasses three adaptations: rectification for zero-shot evaluation and fine-tuning of nuScenes-trained models, distortion-aware view transformation modules (VTMs) via the MEI camera model, and polar coordinate representations to better align with radial distortion. We systematically evaluate three representative BEV architectures, BEVFormer, BEVDet and PETR, across these strategies. We demonstrate that projection-free architectures are inherently more robust and effective against fisheye distortion than other VTMs. This work establishes the first real-data 3D detection benchmark with fisheye and pinhole images and provides systematic adaptation and practical guidelines for designing robust and cost-effective 3D perception systems. The code is available at https://github.com/CesarLiu/FishBEVOD.git.