Benchmarking Rotary Position Embeddings for Automatic Speech Recognition

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the effectiveness of Rotary Position Embedding (RoPE) in automatic speech recognition (ASR), addressing the lack of empirical evaluation of RoPE in speech modeling. Motivated by the limited representational capacity of mainstream relative position encodings for long-duration speech sequences, we integrate RoPE into Transformer-based ASR models within the SpeechBrain framework and conduct end-to-end experiments across multiple benchmarks—including LibriSpeech and AISHELL-1. Results demonstrate consistent word error rate (WER) reductions across all settings, with average improvements of 0.5–1.2 percentage points, confirming RoPE’s superior capability in modeling positional relationships in speech. To our knowledge, this work provides the first comprehensive empirical validation of RoPE’s superiority and cross-corpus generalizability in ASR. Furthermore, we publicly release all code, configuration files, and training scripts, establishing a reproducible benchmark and empirical foundation for standardizing position encoding in speech recognition.

Technology Category

Application Category

📝 Abstract
Rotary Position Embedding (RoPE) encodes relative and absolute positional information in Transformer-based models through rotation matrices applied to input vectors within sequences. While RoPE has demonstrated superior performance compared to other positional embedding technologies in natural language processing tasks, its effectiveness in speech processing applications remains understudied. In this work, we conduct a comprehensive evaluation of RoPE across diverse automatic speech recognition (ASR) tasks. Our experimental results demonstrate that for ASR tasks, RoPE consistently achieves lower error rates compared to the currently widely used relative positional embedding. To facilitate further research, we release the implementation and all experimental recipes through the SpeechBrain toolkit.
Problem

Research questions and friction points this paper is trying to address.

Rotary Positional Encoding
Speech Automatic Recognition
Relative Positional Encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotary Positional Encoding
Automatic Speech Recognition
SpeechBrain Toolbox
🔎 Similar Papers
No similar papers found.