Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the joint optimization of communication content and precision in bandwidth-constrained multi-agent reinforcement learning (MARL). We propose the first end-to-end differentiable discrete communication framework supporting bit-level message precision. Unlike existing methods that only decide *whether* to communicate, our approach overcomes the gradient vanishing problem inherent in discrete decision-making by extending Differentiable Discrete Communication Learning (DDCL) to support unbounded signals, and introduces a generic, plug-and-play communication layer. This layer integrates seamlessly into Transformers and mainstream MARL algorithms. Experiments demonstrate that our method reduces communication bandwidth by over an order of magnitude while maintaining or even improving task performance. Remarkably, a lightweight architecture with fine-grained bit-level precision control matches or surpasses complex, task-specific designs—empirically validating our core principle: *precise communication is superior to redundant communication*.

Technology Category

Application Category

📝 Abstract
Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms that only decide extit{whether} to communicate, not extit{how precisely}. Learning to optimize message precision at the bit-level is fundamentally harder, as the required discretization step breaks gradient flow. We address this by generalizing Differentiable Discrete Communication Learning (DDCL), a framework for end-to-end optimization of discrete messages. Our primary contribution is an extension of DDCL to support unbounded signals, transforming it into a universal, plug-and-play layer for any MARL architecture. We verify our approach with three key results. First, through a qualitative analysis in a controlled environment, we demonstrate extit{how} agents learn to dynamically modulate message precision according to the informational needs of the task. Second, we integrate our variant of DDCL into four state-of-the-art MARL algorithms, showing it reduces bandwidth by over an order of magnitude while matching or exceeding task performance. Finally, we provide direct evidence for the enquote{Bitter Lesson} in MARL communication: a simple Transformer-based policy leveraging DDCL matches the performance of complex, specialized architectures, questioning the necessity of bespoke communication designs.
Problem

Research questions and friction points this paper is trying to address.

Optimizing message precision at bit-level in multi-agent communication
Enabling end-to-end optimization of discrete messages with gradient flow
Reducing bandwidth usage while maintaining task performance in MARL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes message precision at bit-level
Extends DDCL to support unbounded signals
Integrates as plug-and-play layer in MARL
🔎 Similar Papers
No similar papers found.