🤖 AI Summary
Contemporary cooperative AI systems prioritize unconditional compliance with human instructions, increasing safety risks and undermining collaborative efficiency. Method: This paper introduces *intelligent disobedience*—a principled, agentic capability enabling AI to autonomously question, delay, or refuse execution upon detecting ethical conflicts, factual inaccuracies, or potential harm. We propose an AI agency taxonomy tailored for human-AI collaboration, formally delineating the applicability boundaries and ethical constraints of intelligent disobedience; further, we conduct multi-scenario case analyses and hierarchical modeling to characterize its behavioral manifestations and decision logic across autonomy levels. Contribution/Results: This work establishes intelligent disobedience as a foundational research direction for cooperative AI, shifting the paradigm from passive instruction-following to responsible, context-aware co-agency. It provides both theoretical grounding and design principles for safe, trustworthy human-AI co-governance.
📝 Abstract
Artificial intelligence has made remarkable strides in recent years, achieving superhuman performance across a wide range of tasks. Yet despite these advances, most cooperative AI systems remain rigidly obedient, designed to follow human instructions without question and conform to user expectations, even when doing so may be counterproductive or unsafe. This paper argues for expanding the agency of AI teammates to include extit{intelligent disobedience}, empowering them to make meaningful and autonomous contributions within human-AI teams. It introduces a scale of AI agency levels and uses representative examples to highlight the importance and growing necessity of treating AI autonomy as an independent research focus in cooperative settings. The paper then explores how intelligent disobedience manifests across different autonomy levels and concludes by proposing initial boundaries and considerations for studying disobedience as a core capability of artificial agents.