The Conformer Encoder May Reverse the Time Dimension

๐Ÿ“… 2024-10-01
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work identifies a temporal inversion phenomenon in Conformer encoders during audio-visual event detection (AED) training: self-attention progressively suppresses the feed-forward path, causing cross-attention to decay and ultimately reversing the temporal modeling orderโ€”degrading performance. We are the first to theoretically analyze and empirically validate this mechanism, precisely localizing the inversion onset stage. To address it, we propose an unsupervised frame-label alignment method based on label log-probability gradients, enabling fine-grained temporal localization without manual annotations. We further design and evaluate multiple inversion-mitigation strategies. Experiments demonstrate that our approach significantly enhances the temporal modeling capability and interpretability of AED models, achieving high-precision label-frame alignment across multiple benchmarks.

Technology Category

Application Category

๐Ÿ“ Abstract
We sometimes observe monotonically decreasing cross-attention weights in our Conformer-based global attention-based encoder-decoder (AED) models, Further investigation shows that the Conformer encoder reverses the sequence in the time dimension. We analyze the initial behavior of the decoder cross-attention mechanism and find that it encourages the Conformer encoder self-attention to build a connection between the initial frames and all other informative frames. Furthermore, we show that, at some point in training, the self-attention module of the Conformer starts dominating the output over the preceding feed-forward module, which then only allows the reversed information to pass through. We propose methods and ideas of how this flipping can be avoided and investigate a novel method to obtain label-frame-position alignments by using the gradients of the label log probabilities w.r.t. the encoder input frames.
Problem

Research questions and friction points this paper is trying to address.

Conformer
Cross-Attention Degradation
Sequence Information Reversal
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformer Encoder-Decoder Models
Temporal Order Reversal Solution
Log Probability Transition Method
๐Ÿ”Ž Similar Papers
No similar papers found.