🤖 AI Summary
This study addresses a fundamental limitation in existing AI accountability frameworks, which presuppose the existence of identifiable responsible agents—an assumption that breaks down in highly autonomous human-AI collaborative systems. The authors formalize human-machine collectives as state-policy tuples and integrate structural causal models with a four-dimensional (cognitive, executive, evaluative, and social) information-theoretic measure of autonomy. They propose four axioms for legitimate accountability and prove that when a system’s composite autonomy exceeds a defined “accountability boundary” and incorporates human-AI feedback loops, it becomes impossible to satisfy all axioms simultaneously. This work establishes the first impossibility theorem in AI governance, revealing inherent structural limits to accountability in high-autonomy systems. Through 3,000 synthetically generated collective experiments, the authors empirically validate their theoretical predictions and identify a critical threshold beyond which distributed accountability mechanisms become necessary.
📝 Abstract
Existing accountability frameworks for AI systems, legal, ethical, and regulatory, rest on a shared assumption: for any consequential outcome, at least one identifiable person had enough involvement and foresight to bear meaningful responsibility. This paper proves that agentic AI systems violate this assumption not as an engineering limitation but as a mathematical necessity once autonomy exceeds a computable threshold. We introduce Human-Agent Collectives, a formalisation of joint human-AI systems where agents are modelled as state-policy tuples within a shared structural causal model. Autonomy is characterised through a four-dimensional information-theoretic profile (epistemic, executive, evaluative, social); collective behaviour through interaction graphs and joint action spaces. We axiomatise legitimate accountability through four minimal properties: Attributability (responsibility requires causal contribution), Foreseeability Bound (responsibility cannot exceed predictive capacity), Non-Vacuity (at least one agent bears non-trivial responsibility), and Completeness (all responsibility must be fully allocated). Our central result, the Accountability Incompleteness Theorem, proves that for any collective whose compound autonomy exceeds the Accountability Horizon and whose interaction graph contains a human-AI feedback cycle, no framework can satisfy all four properties simultaneously. The impossibility is structural: transparency, audits, and oversight cannot resolve it without reducing autonomy. Below the threshold, legitimate frameworks exist, establishing a sharp phase transition. Experiments on 3,000 synthetic collectives confirm all predictions with zero violations. This is the first impossibility result in AI governance, establishing a formal boundary below which current paradigms remain valid and above which distributed accountability mechanisms become necessary.