🤖 AI Summary
In multi-robot navigation, pre-trained reinforcement learning policies remain prone to collisions in dense obstacle environments; retraining or fine-tuning is costly and risks degrading existing capabilities. This paper proposes a latent-variable activation editing framework that enhances safety at inference time without weight updates—by monitoring and selectively editing intermediate-layer activations of the policy network in real time. It pioneers the adaptation of large-model-inspired activation steering to multi-UAV systems, integrating a risk-aware amplification mechanism with a latent collision world model. An online classifier detects anomalous activations, and feature editing is applied only under high-risk states to guide collision avoidance. Evaluated in simulation and on real Crazyflie UAVs, the method reduces collision rate by nearly 90%, significantly increases the proportion of collision-free trajectories, and preserves task completion performance.
📝 Abstract
Reinforcement learning has enabled significant progress in complex domains such as coordinating and navigating multiple quadrotors. However, even well-trained policies remain vulnerable to collisions in obstacle-rich environments. Addressing these infrequent but critical safety failures through retraining or fine-tuning is costly and risks degrading previously learned skills. Inspired by activation steering in large language models and latent editing in computer vision, we introduce a framework for inference-time Latent Activation Editing (LAE) that refines the behavior of pre-trained policies without modifying their weights or architecture. The framework operates in two stages: (i) an online classifier monitors intermediate activations to detect states associated with undesired behaviors, and (ii) an activation editing module that selectively modifies flagged activations to shift the policy towards safer regimes. In this work, we focus on improving safety in multi-quadrotor navigation. We hypothesize that amplifying a policy's internal perception of risk can induce safer behaviors. We instantiate this idea through a latent collision world model trained to predict future pre-collision activations, thereby prompting earlier and more cautious avoidance responses. Extensive simulations and real-world Crazyflie experiments demonstrate that LAE achieves statistically significant reduction in collisions (nearly 90% fewer cumulative collisions compared to the unedited baseline) and substantially increases the fraction of collision-free trajectories, while preserving task completion. More broadly, our results establish LAE as a lightweight paradigm, feasible on resource-constrained hardware, for post-deployment refinement of learned robot policies.