🤖 AI Summary
This work addresses the reliance of fixed-wing UAV autonomous landing on IMUs and motion-capture systems. We propose an open-source, ultra-lightweight (150 g) vision-only landing framework. Methodologically, we introduce a novel “real–sim–real” learning paradigm: (i) high-fidelity photorealistic simulation is built using 3D Gaussian splatting; (ii) dynamics modeling is informed by visual pose estimation, integrated with a multimodal Vision Transformer (ViT) policy; and (iii) end-to-end control is achieved via self-attention-based temporal fusion, enabling real-time inference at 20 Hz. Our key contributions are: (i) the first demonstration of vision-only autonomous landing on a fixed-wing platform without IMU or motion-capture support (80% success rate); and (ii) full open-sourcing of hardware schematics, dynamics model, simulator, and learning framework—establishing a reproducible, scalable paradigm for autonomy research under resource constraints.
📝 Abstract
We present FalconWing -- an open-source, ultra-lightweight (150 g) fixed-wing platform for autonomy research. The hardware platform integrates a small camera, a standard airframe, offboard computation, and radio communication for manual overrides. We demonstrate FalconWing's capabilities by developing and deploying a purely vision-based control policy for autonomous landing (without IMU or motion capture) using a novel real-to-sim-to-real learning approach. Our learning approach: (1) constructs a photorealistic simulation environment via 3D Gaussian splatting trained on real-world images; (2) identifies nonlinear dynamics from vision-estimated real-flight data; and (3) trains a multi-modal Vision Transformer (ViT) policy through simulation-only imitation learning. The ViT architecture fuses single RGB image with the history of control actions via self-attention, preserving temporal context while maintaining real-time 20 Hz inference. When deployed zero-shot on the hardware platform, this policy achieves an 80% success rate in vision-based autonomous landings. Together with the hardware specifications, we also open-source the system dynamics, the software for photorealistic simulator and the learning approach.