π€ AI Summary
This work addresses the absence of a unified middleware for integrating learning strategies, planners, and vision-language-action models in physical AI systems across control, computation, and communication layers. Introducing the concept of a βharnessβ to robotics for the first time, the paper proposes a middleware architecture built upon ROS 2, DDS, and Zenoh. It enables coordinated management at both resource and behavioral levels through three core primitives: Projection (output constraints), Isolation (execution isolation), and Transfer (transport guarantees). The framework supports deployment specifications such as output region declarations and inference budgets, offering a reusable and verifiable integration solution that significantly enhances safety, real-time performance, and robustness in physical AI systems.
π Abstract
Robot middleware faces a new role in the era of Physical AI. Learned policies, planners, and vision-language-action (VLA) models now enter deployed robots as causal participants on the control path, but the layer that integrates them with timing, scheduling, and network has not been named. Recent language-agent work names this layer the harness, the external system that mediates tools, manages state, bounds resources, and records execution. The robotics community has not yet adopted this framing, and we propose that robot middleware is that harness. A Physical AI harness differs from a software harness in where it intervenes. A software harness mediates at tool-call boundaries. A Physical AI harness must mediate at control, computing, and communication simultaneously, because a learned policy's output crosses all three: its commands shift the trajectory, its inference time shifts the schedule, and its payload shifts the bandwidth. Robot middleware is the lowest robot-stack layer with mediating abstractions over all three, so it is best positioned to compose their enforcement. It already provides most of what a harness needs but lacks the enforcement for an AI model. We name this missing enforcement as three functions: Projection gates each output at emission, Isolation bounds the model's execution and transmission slot, and Transfer falls back to a verified baseline when checks fail. Each appears today as hand-built application code in deployed robot systems, built on surfaces robot middleware already provides. Robot middleware should host them not as the best single-axis enforcer but as the layer that composes all three. We sketch this as a ROS 2 Harness Profile, a deployment artifact that carries an AI model's declared output region, inference budget, and operating regime while the middleware enforces them across ROS 2, DDS, and Zenoh.