Safe Planning in Interactive Environments via Iterative Policy Updates and Adversarially Robust Conformal Prediction

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In interactive environments, safety-aware planning for autonomous agents (e.g., self-driving vehicles) faces a fundamental challenge: policy updates dynamically alter environment behavior, inducing distributional shift that violates the exchangeability assumption required by conventional conformal prediction (CP), thereby invalidating existing safety guarantees. This work introduces the first safety-aware planning framework for policy–environment cyclic dependency, integrating iterative policy optimization with adversarial robust conformal prediction. It online quantifies policy-to-trajectory sensitivity to characterize distributional shift and ensures both policy convergence and persistent satisfaction of safety constraints via contraction analysis. Implemented within an episodic open-loop planning architecture, the method enables efficient and safe inference. Experiments on 2D vehicle–pedestrian interaction demonstrate strict maintenance of safety bounds across successive policy updates and stable policy convergence.

Technology Category

Application Category

📝 Abstract
Safe planning of an autonomous agent in interactive environments -- such as the control of a self-driving vehicle among pedestrians and human-controlled vehicles -- poses a major challenge as the behavior of the environment is unknown and reactive to the behavior of the autonomous agent. This coupling gives rise to interaction-driven distribution shifts where the autonomous agent's control policy may change the environment's behavior, thereby invalidating safety guarantees in existing work. Indeed, recent works have used conformal prediction (CP) to generate distribution-free safety guarantees using observed data of the environment. However, CP's assumption on data exchangeability is violated in interactive settings due to a circular dependency where a control policy update changes the environment's behavior, and vice versa. To address this gap, we propose an iterative framework that robustly maintains safety guarantees across policy updates by quantifying the potential impact of a planned policy update on the environment's behavior. We realize this via adversarially robust CP where we perform a regular CP step in each episode using observed data under the current policy, but then transfer safety guarantees across policy updates by analytically adjusting the CP result to account for distribution shifts. This adjustment is performed based on a policy-to-trajectory sensitivity analysis, resulting in a safe, episodic open-loop planner. We further conduct a contraction analysis of the system providing conditions under which both the CP results and the policy updates are guaranteed to converge. We empirically demonstrate these safety and convergence guarantees on a two-dimensional car-pedestrian case study. To the best of our knowledge, these are the first results that provide valid safety guarantees in such interactive settings.
Problem

Research questions and friction points this paper is trying to address.

Addresses safe planning in interactive environments with unknown reactive behaviors
Solves interaction-driven distribution shifts invalidating existing safety guarantees
Maintains safety across policy updates using robust conformal prediction methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative policy updates maintain safety across interactions
Adversarially robust conformal prediction handles distribution shifts
Policy-to-trajectory sensitivity analysis enables safe planning
🔎 Similar Papers
No similar papers found.