🤖 AI Summary
This study addresses the challenge of designing autonomous agents capable of responsibly refusing to execute inappropriate or high-risk user requests. To this end, the authors propose a multidimensional framework of “responsible non-compliance,” which emphasizes the justifiability of refusal behaviors, the feasibility of user override, and clear attribution of responsibility. Integrating ethical modeling from human–computer interaction, explainable decision-making mechanisms, and accountability-tracing techniques, the work establishes foundational design principles and a core architectural blueprint for such agents. This research provides both a theoretical foundation and a practical pathway toward developing safe, trustworthy, and ethically discerning intelligent systems suitable for high-stakes scenarios.
📝 Abstract
We consider the problem of engineering autonomous intelligent agents that are capable to responsibly not comply with user requests. We argue that machine non-compliance comes in many different forms, and sketch the issues we should pursue on the road of accomplishing responsibly non-compliant intelligent machines. We anchor responsible non-compliance in justifications for task refusal, pathways to override the non-compliance, as well as careful tracking of security risks and liability transfers.