Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

167K/year
🤖 AI Summary
This work addresses the issue of stream collapse in Hyper-Connections-based multi-stream architectures, where excessive symmetry during training restricts inter-stream information exchange and causes the model to degenerate into single-stream behavior. The study provides the first systematic diagnosis of this phenomenon, revealing that inter-stream residual mixing converges toward identity mappings and that interpretable features become concentrated in a single dominant stream. To mitigate this, the authors propose introducing symmetry-breaking perturbations during initialization to effectively disrupt inter-stream symmetry. Experimental results demonstrate that this approach significantly alleviates stream collapse, enhances multi-stream utilization, and improves overall performance across various multi-stream Transformer (mHC) variants.
📝 Abstract
Hyper-Connections (HC) replace the single Transformer residual stream with multiple streams, introducing a permutation symmetry over stream indices. We study how this symmetry is resolved in practice: whether streams specialize in a balanced way or exhibit dominant-stream usage. Using fine-grained diagnostics for HC-based language models, we trace how multi-stream representations are actually used. We find that after an early seeding stage, residual mixing often remains close to identity, limiting a core HC mechanism for exchanging information between streams. Moreover, both signal and interpretable features concentrate in a dominant stream, and the nominally multi-stream residual connection can underutilize its capacity, behaving closer to a single-stream residual pathway. Finally, we show that breaking symmetry at stream initialization reduces dominant behavior and improves performance across \textit{m}HC variants. Our code is publicly available.
Problem

Research questions and friction points this paper is trying to address.

Stream Collapse
Hyper-Connections
Permutation Symmetry
Residual Streams
Multi-stream Representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyper-Connections
stream collapse
symmetry breaking
multi-stream transformers
residual mixing