Streaming Communication in Multi-Agent Reasoning

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the limitations of conventional multi-agent reasoning systems, which follow a “generate-then-transmit” paradigm that incurs linearly increasing latency with pipeline depth and propagates errors downstream. To overcome these issues, the authors propose StreamMA, the first framework to introduce step-level streaming communication, enabling pipeline parallelism between adjacent agents. StreamMA supports chain, tree, and graph topologies, and integrates dynamic scheduling with error suppression mechanisms. Built upon Claude Opus 4.6 and GPT-5.4, the system is theoretically grounded in a novel “step-level scaling law” that unifies the performance boundaries of streaming, sequential, and single-agent protocols. Empirical evaluation across eight benchmarks in mathematics, science, and code generation demonstrates that StreamMA improves average accuracy by 7.3 percentage points—reaching up to 22.4% on HMMT 2026—while substantially reducing end-to-end latency.

📝 Abstract

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.

Problem

Research questions and friction points this paper is trying to address.

multi-agent reasoning

streaming communication

end-to-end latency

reasoning reliability

pipeline depth

Innovation

Methods, ideas, or system contributions that make the work stand out.

streaming communication

multi-agent reasoning

pipeline parallelism