🤖 AI Summary
Long-horizon embodied intelligence in physical environments faces multiple safety challenges, including semantic misalignment, error propagation, execution drift, and contact-intensive manipulation. Existing research lacks systematic integration across planning, policy, and execution layers. This work proposes a unified cross-layer safety analysis framework that categorizes safety mechanisms according to intervention timing—planning-time, policy-time, and execution-time—and evaluates their evidential strength based on formal guarantees, statistical support, and empirical heuristics. Through a comprehensive literature review and taxonomic analysis, the study clarifies the distinct roles of core capabilities, safety mechanisms, and evaluation methodologies, identifies critical gaps—particularly in policy-time safety, formal assurances for contact-rich tasks, and dedicated benchmarks—and outlines future directions for cross-layer safety assurance, evaluation design, and safe deployment.
📝 Abstract
Embodied AI systems are increasingly expected to reason and act over extended horizons in physical environments. This growing capability brings safety to the foreground, because failures in the physical world can harm people, damage objects, and disrupt workplaces. Although safe embodied AI has attracted substantial attention, the literature remains fragmented across planning, policy design, and runtime execution. Long-horizon robotic manipulation is a particularly revealing anchor domain for this problem because semantic misgrounding, subtask-level error propagation, execution drift, and contact-rich physical risk can accumulate within the same closed-loop system. This survey therefore provides a structured review of safety in long-horizon robotic manipulation from an embodied AI perspective. We organize the literature by intervention locus, covering planning-time, policy-time, and execution-time safety, and we analyze the strength of the evidence that each line of work provides, distinguishing formal guarantees, statistical support, and empirical safety heuristics. This framework clarifies the distinct roles of backbone capability papers, direct safety mechanisms, and benchmark or evaluation studies, while exposing where current safety claims are well supported and where they remain indirect. We identify persistent gaps, including limited evidence for policy-time safety, weak formal support for contact-rich long-horizon manipulation, immature uncertainty-triggered intervention, and a shortage of manipulation-specific safety benchmarks. We conclude by outlining research directions for cross-layer assurance, evaluation design, and safer deployment of long-horizon robotic agents in real-world settings.