🤖 AI Summary
This work addresses the challenge of low-latency, high-fidelity outdoor 3D scene reconstruction for real-time AR/VR applications by presenting the first end-to-end drone-based real-time reconstruction system that fully integrates 3D Gaussian Splatting (3DGS). The system combines RTMP video streaming, multi-sensor synchronization, real-time camera pose estimation, and online 3DGS optimization to enable continuous model updates and interactive rendering. Experimental results demonstrate that the reconstructed scenes achieve 93–96% of the fidelity of offline high-quality reconstructions, while offering significantly superior rendering performance compared to NeRF and substantially reduced end-to-end latency, thereby effectively supporting real-time aerial augmented perception.
📝 Abstract
In this study, we present an end-to-end pipeline capable of converting drone-captured video streams into high-fidelity 3D reconstructions with minimal latency. Unmanned aerial vehicles (UAVs) are extensively used in aerial real-time perception applications. Moreover, recent advances in 3D Gaussian Splatting (3DGS) have demonstrated significant potential for real-time neural rendering. However, their integration into end-to-end UAV-based reconstruction and visualization systems remains underexplored. Our goal is to propose an efficient architecture that combines live video acquisition via RTMP streaming, synchronized sensor fusion, camera pose estimation, and 3DGS optimization, achieving continuous model updates and low-latency deployment within interactive visualization environments that supports immersive augmented and virtual reality (AR/VR) applications. Experimental results demonstrate that the proposed method achieves competitive visual fidelity, while delivering significantly higher rendering performance and substantially reduced end-to-end latency, compared to NeRF-based approaches. Reconstruction quality remains within 4-7\% of high-fidelity offline references, confirming the suitability of the proposed system for real-time, scalable augmented perception from aerial platforms.