🤖 AI Summary
To address insufficient end-to-end perception–control coupling in legged robots operating in real-world complex environments, this paper proposes a tightly coupled spatiotemporal perception–control architecture. The method fuses RGB-D and GNSS modalities while omitting noise-prone IMU sensors; instead, heading is directly estimated from consecutive GNSS position measurements, significantly enhancing robustness in open-area navigation. An EfficientNet-B0 encoder combined with a temporal fusion network jointly predicts semantic segmentation, dense depth maps, and control policies. Evaluated on a custom-built, diverse野外 dataset, the model achieves lightweight deployment (<15 MB) and outperforms or matches state-of-the-art baselines on challenging terrains—including grassy fields and unstructured roads—demonstrating superior generalization and real-time capability. This work delivers an efficient, robust, end-to-end embodied navigation solution for legged locomotion in unstructured outdoor environments.
📝 Abstract
We present Seq-DeepIPC, a sequential end-to-end perception-to-control model for legged robot navigation in realworld environments. Seq-DeepIPC advances intelligent sensing for autonomous legged navigation by tightly integrating multi-modal perception (RGB-D + GNSS) with temporal fusion and control. The model jointly predicts semantic segmentation and depth estimation, giving richer spatial features for planning and control. For efficient deployment on edge devices, we use EfficientNet-B0 as the encoder, reducing computation while maintaining accuracy. Heading estimation is simplified by removing the noisy IMU and instead computing the bearing angle directly from consecutive GNSS positions. We collected a larger and more diverse dataset that includes both road and grass terrains, and validated Seq-DeepIPC on a robot dog. Comparative and ablation studies show that sequential inputs improve perception and control in our models, while other baselines do not benefit. Seq-DeepIPC achieves competitive or better results with reasonable model size; although GNSS-only heading is less reliable near tall buildings, it is robust in open areas. Overall, Seq-DeepIPC extends end-to-end navigation beyond wheeled robots to more versatile and temporally-aware systems. To support future research, we will release the codes to our GitHub repository at https://github.com/oskarnatan/Seq-DeepIPC.