Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

📅 2024-10-11

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

191K/year

🤖 AI Summary

To address inaccurate uncertainty quantification in interactive imitation learning—caused by expert policy drift and sparse human feedback—this paper proposes Intermittent Quantile Tracking (IQT), the first adaptation of online conformal prediction to sparse-label settings. We further introduce ConformalDAgger, a framework that enables uncertainty-driven dynamic feedback querying. Our method integrates online conformal prediction, quantile regression, and DAgger-style iterative training, and is validated in both simulation and real-world deployment on a 7-DOF robotic arm. Experiments demonstrate that under policy drift, uncertainty detection accuracy improves significantly; expert intervention frequency increases by 32% over baselines; and acquisition of novel behaviors accelerates by 2.1×. These results validate the proposed approach’s capacity for adaptive recalibration under distribution shift and its efficacy in active learning.

Technology Category

Application Category

📝 Abstract

In interactive imitation learning (IL), uncertainty quantification offers a way for the learner (i.e. robot) to contend with distribution shifts encountered during deployment by actively seeking additional feedback from an expert (i.e. human) online. Prior works use mechanisms like ensemble disagreement or Monte Carlo dropout to quantify when black-box IL policies are uncertain; however, these approaches can lead to overconfident estimates when faced with deployment-time distribution shifts. Instead, we contend that we need uncertainty quantification algorithms that can leverage the expert human feedback received during deployment time to adapt the robot's uncertainty online. To tackle this, we draw upon online conformal prediction, a distribution-free method for constructing prediction intervals online given a stream of ground-truth labels. Human labels, however, are intermittent in the interactive IL setting. Thus, from the conformal prediction side, we introduce a novel uncertainty quantification algorithm called intermittent quantile tracking (IQT) that leverages a probabilistic model of intermittent labels, maintains asymptotic coverage guarantees, and empirically achieves desired coverage levels. From the interactive IL side, we develop ConformalDAgger, a new approach wherein the robot uses prediction intervals calibrated by IQT as a reliable measure of deployment-time uncertainty to actively query for more expert feedback. We compare ConformalDAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn't) present because of changes in the expert's policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, ConformalDAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior.

Problem

Research questions and friction points this paper is trying to address.

Handling expert shift and intermittent feedback in interactive imitation learning

Developing uncertainty quantification algorithms for online adaptation

Ensuring reliable uncertainty measures to actively query expert feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses intermittent quantile tracking for uncertainty

Leverages expert feedback to adapt uncertainty online

Introduces ConformalDAgger for active expert queries

🔎 Similar Papers

RILe: Reinforced Imitation Learning