🤖 AI Summary
For autonomous inspection of sparse semantic targets in unknown environments, this paper proposes an end-to-end semantic-aware path planning method that jointly performs target recognition and collision-free navigation using only real-time depth maps, semantic segmentation outputs, local occupancy grids, and historical pose estimates. The approach innovatively integrates a semantic attention mechanism into a Proximal Policy Optimization (PPO) reinforcement learning framework, augmented by multimodal sensory fusion, spatiotemporal trajectory encoding, and a lightweight semantic gating module to enable task-driven, sparse-target-oriented navigation. It significantly enhances sim-to-real generalization, maintaining robustness even under unseen semantic categories and geometric layouts. Evaluations in simulation and on real-world aerial robots demonstrate a 37% improvement in path success rate and a 92.4% target recall rate.
📝 Abstract
This paper introduces a novel semantics-aware inspection planning policy derived through deep reinforcement learning. Reflecting the fact that within autonomous informative path planning missions in unknown environments, it is often only a sparse set of objects of interest that need to be inspected, the method contributes an end-to-end policy that simultaneously performs semantic object visual inspection combined with collision-free navigation. Assuming access only to the instantaneous depth map, the associated segmentation image, the ego-centric local occupancy, and the history of past positions in the robot's neighborhood, the method demonstrates robust generalizability and successful crossing of the sim2real gap. Beyond simulations and extensive comparison studies, the approach is verified in experimental evaluations onboard a flying robot deployed in novel environments with previously unseen semantics and overall geometric configurations.