From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses error accumulation, violations of social norms, and insufficient counterfactual reasoning in imitation learning for long-horizon sidewalk navigation using a monocular RGB camera. To overcome these challenges, the authors propose FlowPilot, a map-free navigation policy that introduces anchored flow matching for behavior modeling. FlowPilot leverages large-scale fleet-collected data for pretraining and incorporates human preference feedback for fine-tuning, thereby bridging the gap from imitation to socially aligned behavior. Operating under minimal perceptual constraints—relying solely on monocular vision—FlowPilot demonstrates markedly improved robustness and social compliance, achieving a 42% success rate and 66% path completion rate in simulation. In real-world experiments, FlowPilot-HP reduces intervention rates by 40.0% and unexpected interaction rates by 52.1% compared to baseline methods.

📝 Abstract

Autonomous long-horizon sidewalk navigation is essential for micro-mobility applications such as robotic food delivery and assistive electronic wheelchairs. Unlike autonomous driving on the road, long-horizon sidewalk navigation requires precise maneuvering through unpredictable sidewalk terrains and pedestrians, with a lightweight perception stack as minimal as a single monocular RGB camera. While imitation learning (IL) from demonstrations offers a practical solution, the resulting autopilot policy often suffers from compounding errors, a lack of social compliance on sidewalks, and deficiencies in counterfactual reasoning to handle complex situations. To address these challenges, we introduce FlowPilot, a mapless navigation policy that achieves robust and efficient long-horizon navigation performance using only a monocular RGB camera. We first propose to use anchored flow matching as an action representation for policy pre-training on large-scale robot fleet data and to capture the diverse, complex, multimodal distribution of sidewalk navigation behaviors. To bridge the gap between imitation and alignment, we further design a human-in-the-loop preference learning scheme to tune the policy on a small amount of human intervention data. It strengthens the model's counterfactual reasoning and social compliance on sidewalks. We evaluate FlowPilot through extensive simulation and real-world experiments in diverse sidewalk environments. FlowPilot achieves 42% success rate and 66% route completion in simulation, while FlowPilot-HP further improves real-world robustness and social compliance, reducing IR by 40.0% and NIR by 52.1% relative to the base model.

Problem

Research questions and friction points this paper is trying to address.

long-horizon navigation

sidewalk navigation

imitation learning

social compliance

counterfactual reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching

preference learning

imitation learning