🤖 AI Summary
Current AI alignment research predominantly focuses on single-agent systems or aggregate human preferences, neglecting systemic misalignment arising from heterogeneous objectives and conflicting preferences among multiple stakeholders. Method: This paper proposes a computational social science–driven multi-agent alignment framework—the first to integrate formal dispute modeling into AI alignment—by constructing weighted preference graphs and agent-based simulation models that enable cross-domain, multi-stakeholder, preference-weighted misalignment quantification. Contribution/Results: The approach transcends traditional unidimensional value alignment paradigms. Empirical evaluation in autonomous driving demonstrates its capacity to identify high-conflict stakeholder preference hotspots and reproduce canonical misalignment patterns. It significantly enhances interpretability of alignment failures and provides actionable guidance for socio-technical system design, thereby addressing a critical gap in modeling dynamic misalignment within complex, adaptive sociotechnical systems.
📝 Abstract
Existing work on the alignment problem has focused mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a monolith. Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents. We address this gap by adapting a computational social science model of human contention to the alignment problem. Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals across various problem areas. Misalignment scores in our framework depend on the observed agent population, the domain in question, and conflict between agents' weighted preferences. Through simulations, we demonstrate how our model captures intuitive aspects of misalignment across different scenarios. We then apply our model to two case studies, including an autonomous vehicle setting, showcasing its practical utility. Our approach offers enhanced explanatory power for complex sociotechnical environments and could inform the design of more aligned AI systems in real-world applications.