From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses error accumulation, violations of social norms, and insufficient counterfactual reasoning in imitation learning for long-horizon sidewalk navigation using a monocular RGB camera. To overcome these challenges, the authors propose FlowPilot, a map-free navigation policy that introduces anchored flow matching for behavior modeling. FlowPilot leverages large-scale fleet-collected data for pretraining and incorporates human preference feedback for fine-tuning, thereby bridging the gap from imitation to socially aligned behavior. Operating under minimal perceptual constraints—relying solely on monocular vision—FlowPilot demonstrates markedly improved robustness and social compliance, achieving a 42% success rate and 66% path completion rate in simulation. In real-world experiments, FlowPilot-HP reduces intervention rates by 40.0% and unexpected interaction rates by 52.1% compared to baseline methods.
📝 Abstract
Autonomous long-horizon sidewalk navigation is essential for micro-mobility applications such as robotic food delivery and assistive electronic wheelchairs. Unlike autonomous driving on the road, long-horizon sidewalk navigation requires precise maneuvering through unpredictable sidewalk terrains and pedestrians, with a lightweight perception stack as minimal as a single monocular RGB camera. While imitation learning (IL) from demonstrations offers a practical solution, the resulting autopilot policy often suffers from compounding errors, a lack of social compliance on sidewalks, and deficiencies in counterfactual reasoning to handle complex situations. To address these challenges, we introduce FlowPilot, a mapless navigation policy that achieves robust and efficient long-horizon navigation performance using only a monocular RGB camera. We first propose to use anchored flow matching as an action representation for policy pre-training on large-scale robot fleet data and to capture the diverse, complex, multimodal distribution of sidewalk navigation behaviors. To bridge the gap between imitation and alignment, we further design a human-in-the-loop preference learning scheme to tune the policy on a small amount of human intervention data. It strengthens the model's counterfactual reasoning and social compliance on sidewalks. We evaluate FlowPilot through extensive simulation and real-world experiments in diverse sidewalk environments. FlowPilot achieves 42% success rate and 66% route completion in simulation, while FlowPilot-HP further improves real-world robustness and social compliance, reducing IR by 40.0% and NIR by 52.1% relative to the base model.
Problem

Research questions and friction points this paper is trying to address.

long-horizon navigation
sidewalk navigation
imitation learning
social compliance
counterfactual reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching
preference learning
imitation learning
monocular navigation
social compliance
🔎 Similar Papers
No similar papers found.