🤖 AI Summary
This work addresses the challenges of high-speed FPV quadrotor flight using only a monocular RGB camera in complex environments, where optical flow is often corrupted by mixed motion cues and low signal-to-noise ratios in focus-of-expansion regions. The authors propose decomposing optical flow into translational and rotational components, leveraging only the translational flow—which carries geometric and depth information—and combining it with forward-backward flow inconsistency to generate an uncertainty mask that highlights obstacle structures. This joint representation effectively disentangles ego-motion-induced background flow from obstacle-related flow for the first time, substantially improving perception reliability. An end-to-end neural control policy, trained in a differentiable simulator, achieves robust flight speeds of 13.91 m/s in simulation and 11.79 m/s in real-world forest environments, with a 93.3% success rate over 30 real-world trials—nearly twice the speed of existing comparable systems.
📝 Abstract
Autonomous FPV quadrotor flight in complex environments using a monocular RGB camera as the sole exteroceptive sensor remains a fundamental challenge. Recent research has shown that using optical flow as the input of a neural network can achieve end-to-end autonomous flight in cluttered scenes. However, extracting the most relevant information from the flow estimation is the key bottleneck limiting agility and robustness. Existing methods struggle to disentangle obstacle-induced optical flow from the ego-motion background flow and suffer from low signal-to-noise ratios near the focus of expansion (FoE). To address these issues, we decompose the optical flow into translational and rotational components and utilize only the translational flow, which captures scene geometry and depth cues. In addition, we introduce an uncertainty mask derived from inconsistencies between forward and backward flow estimates. This mask highlights obstacle structures, including those within the FoE region. Both cues are fed to a control policy trained in a differentiable simulation framework, which enables efficient first-order optimization across perception and control. We validate our approach through extensive experiments in both simulated and real-world forest environments. The proposed system achieves robust flight at speeds of up to 13.91 m/s in simulation and 11.79 m/s in real-world tests, with a 93.3\% success rate over 30 real-world trials, nearly doubling the previously reported 6 m/s real-world speed of the monocular-RGB optical-flow UAV obstacle avoidance system.