A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

268K/year

🤖 AI Summary

This study addresses the performance limitations of 3D human detection in diverse indoor and outdoor environments under challenges such as occlusion, varying distances, and sensor anomalies. It presents a systematic evaluation of three representative approaches—camera-only (BEVDepth), LiDAR-only (PointPillars), and multimodal fusion (DAL)—on the JRDB dataset. As the first comprehensive comparison of multimodal 3D human detection outside autonomous driving scenarios, this work provides an in-depth analysis of how sensor failures and calibration errors affect the robustness of fusion systems. Experimental results demonstrate that DAL generally outperforms single-modality methods in complex scenes but remains sensitive to LiDAR noise and sensor misalignment, whereas the vision-only approach suffers significantly from occlusion and long-range detection.

Technology Category

Application Category

📝 Abstract

Accurate 3D person detection is critical for safety in applications such as robotics, industrial monitoring, and surveillance. This work presents a systematic evaluation of 3D person detection using camera-only, LiDAR-only, and camera-LiDAR fusion. While most existing research focuses on autonomous driving, we explore detection performance and robustness in diverse indoor and outdoor scenes using the JRDB dataset. We compare three representative models - BEVDepth (camera), PointPillars (LiDAR), and DAL (camera-LiDAR fusion) - and analyze their behavior under varying occlusion and distance levels. Our results show that the fusion-based approach consistently outperforms single-modality models, particularly in challenging scenarios. We further investigate robustness against sensor corruptions and misalignments, revealing that while DAL offers improved resilience, it remains sensitive to sensor misalignment and certain LiDAR-based corruptions. In contrast, the camera-based BEVDepth model showed the lowest performance and was most affected by occlusion, distance, and noise. Our findings highlight the importance of utilizing sensor fusion for enhanced 3D person detection, while also underscoring the need for ongoing research to address the vulnerabilities inherent in these systems.

Problem

Research questions and friction points this paper is trying to address.

3D person detection

sensor fusion

robustness

occlusion

sensor misalignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D person detection

sensor fusion

robustness evaluation