From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the risk of irreversible errors in multi-agent debate systems driven by large language models, where erroneous consensus can compromise decision reliability. To mitigate this, the authors propose a novel “conformal social choice” framework that integrates conformal prediction into the post-debate decision layer. The approach aggregates heterogeneous agents’ probabilistic outputs via linear opinion pooling and employs split conformal prediction to construct prediction sets with guaranteed marginal coverage. A hierarchical action policy is introduced to determine whether to execute decisions automatically or escalate to human review. Evaluated on MMLU-Pro across eight domains, the method achieves target coverage within ±1–2% at significance level α=0.05, intercepts 81.9% of erroneous consensus cases, and attains single-element prediction set accuracies of 90.0–96.8%, substantially outperforming conventional consensus mechanisms while effectively balancing safety and automation.

📝 Abstract

Multi-agent debate improves LLM reasoning, yet agreement among agents is not evidence of correctness. When agents converge on a wrong answer through social reinforcement, consensus-based stopping commits that error to an automated action with no recourse. We introduce Conformal Social Choice, a post-hoc decision layer that converts debate outputs into calibrated act-versus-escalate decisions. Verbalized probability distributions from heterogeneous agents are aggregated via a linear opinion pool and calibrated with split conformal prediction, yielding prediction sets with a marginal coverage guarantee: the correct answer is included with probability ${\geq}\,1{-}α$, without assumptions on individual model calibration. A hierarchical action policy maps singleton sets to autonomous action and larger sets to human escalation. On eight MMLU-Pro domains with three agents (Claude Haiku, DeepSeek-R1, Qwen-3 32B), coverage stays within 1--2 points of the target. The key finding is not that debate becomes more accurate, but that the conformal layer makes its failures actionable: 81.9% of wrong-consensus cases are intercepted at $α{=}0.05$. Because the layer refuses to act on cases where debate is confidently wrong, the remaining conformal singletons reach 90.0--96.8% accuracy (up to 22.1pp above consensus stopping) -- a selection effect, not a reasoning improvement. This safety comes at the cost of automation, but the operating point is user-adjustable via $α$.

Problem

Research questions and friction points this paper is trying to address.

multi-agent debate

consensus error

safe automation

actionable failure

calibrated decision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal Social Choice

multi-agent debate

conformal prediction

calibrated decision-making

human escalation

🔎 Similar Papers

No similar papers found.

Authors to Follow