🤖 AI Summary
Existing autonomous driving simulation frameworks struggle to simultaneously achieve high-fidelity vehicle dynamics, photorealistic rendering, context-aware scenario orchestration, and real-time performance. This paper proposes the first high-fidelity digital twin framework integrating physics-based modeling, neural rendering, 3D reconstruction, and a large language model (LLM) interface—marking the inaugural integration of LLMs into autonomous driving simulation. The framework enables natural-language-driven, semantic-level scenario generation and online editing. It achieves real-to-simulation geometric and visual fidelity with a structural similarity index of 97%, sustains simulation rates exceeding 60 Hz, attains a 95% reproducibility rate for natural-language-based scenario generation, and demonstrates 85% cross-scenario generalization capability. By unifying physical accuracy, perceptual realism, linguistic controllability, and computational efficiency, the framework establishes a breakthrough trade-off among fidelity, interactivity, and real-time performance.
📝 Abstract
Simulation frameworks have been key enablers for the development and validation of autonomous driving systems. However, existing methods struggle to comprehensively address the autonomy-oriented requirements of balancing: (i) dynamical fidelity, (ii) photorealistic rendering, (iii) context-relevant scenario orchestration, and (iv) real-time performance. To address these limitations, we present a unified framework for creating and curating high-fidelity digital twins to accelerate advancements in autonomous driving research. Our framework leverages a mix of physics-based and data-driven techniques for developing and simulating digital twins of autonomous vehicles and their operating environments. It is capable of reconstructing real-world scenes and assets (real2sim) with geometric and photorealistic accuracy and infusing them with various physical properties to enable real-time dynamical simulation of the ensuing driving scenarios. Additionally, it also incorporates a large language model (LLM) interface to flexibly edit the driving scenarios online via natural language prompts. We analyze the presented framework in terms of its fidelity, performance, and serviceability. Results indicate that our framework can reconstruct 3D scenes and assets with up to 97% structural similarity, while maintaining frame rates above 60 Hz. We also demonstrate that it can handle natural language prompts to generate diverse driving scenarios with up to 95% repeatability and 85% generalizability.