🤖 AI Summary
To address the pervasive “reality gap” in autonomous driving system (ADS) simulation testing, this paper proposes a multimodal comparative evaluation framework that systematically quantifies discrepancies across software-in-the-loop (SIL), vehicle-in-the-loop (VIL), mixed-reality (MR), and real-world testing—specifically in perception, actuation, and behavioral consistency—and identifies cross-modal failure transfer conditions. Innovatively, we construct a synergistic platform integrating a small-scale physical vehicle equipped with real sensors (camera/LiDAR) and its digital twin, supporting empirical validation across diverse scenarios for both modular and end-to-end ADS architectures. Experiments demonstrate that MR testing significantly narrows the perception-level reality gap—achieving a 37.2% improvement over pure simulation—and effectively decouples critical factors influencing behavioral generalization. This work establishes a reproducible, scalable technical pathway for high-assurance ADS verification.
📝 Abstract
Simulation-based testing is a cornerstone of Autonomous Driving System (ADS) development, offering safe and scalable evaluation across diverse driving scenarios. However, discrepancies between simulated and real-world behavior, known as the reality gap, challenge the transferability of test results to deployed systems. In this paper, we present a comprehensive empirical study comparing four representative testing modalities: Software-in-the-Loop (SiL), Vehicle-in-the-Loop (ViL), Mixed-Reality (MR), and full real-world testing. Using a small-scale physical vehicle equipped with real sensors (camera and LiDAR) and its digital twin, we implement each setup and evaluate two ADS architectures (modular and end-to-end) across diverse indoor driving scenarios involving real obstacles, road topologies, and indoor environments. We systematically assess the impact of each testing modality along three dimensions of the reality gap: actuation, perception, and behavioral fidelity. Our results show that while SiL and ViL setups simplify critical aspects of real-world dynamics and sensing, MR testing improves perceptual realism without compromising safety or control. Importantly, we identify the conditions under which failures do not transfer across testing modalities and isolate the underlying dimensions of the gap responsible for these discrepancies. Our findings offer actionable insights into the respective strengths and limitations of each modality and outline a path toward more robust and transferable validation of autonomous driving systems.