🤖 AI Summary
Traditional centralized approaches to AI safety struggle to address novel risks arising from the autonomy, interactivity, and emergent behaviors of embodied agents in open environments. This work proposes a 4C framework inspired by human social governance, systematically modeling multi-agent AI safety through four dimensions: Core, Connection, Cognition, and Compliance. For the first time, the framework integrates sociotechnical governance principles into AI safety, synthesizing theories from cybersecurity, multi-agent systems, cognitive modeling, and institutional governance. It establishes a multidimensional safeguarding architecture that spans technical, interactive, cognitive, and institutional layers, thereby providing a principled foundation for developing embodied AI systems that are trustworthy, governable, and aligned with human values.
📝 Abstract
AI is moving from domain-specific autonomy in closed, predictable settings to large-language-model-driven agents that plan and act in open, cross-organizational environments. As a result, the cybersecurity risk landscape is changing in fundamental ways. Agentic AI systems can plan, act, collaborate, and persist over time, functioning as participants in complex socio-technical ecosystems rather than as isolated software components. Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail to capture risks that arise from autonomy, interaction, and emergent behavior. This article introduces the 4C Framework for multi-agent AI security, inspired by societal governance. It organizes agentic risks across four interdependent dimensions: Core (system, infrastructure, and environmental integrity), Connection (communication, coordination, and trust), Cognition (belief, goal, and reasoning integrity), and Compliance (ethical, legal, and institutional governance). By shifting AI security from a narrow focus on system-centric protection to the broader preservation of behavioral integrity and intent, the framework complements existing AI security strategies and offers a principled foundation for building agentic AI systems that are trustworthy, governable, and aligned with human values.