Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the implicit encoding of speaker information within the feed-forward layers of self-supervised speech Transformers. To identify speaker-sensitive neurons, we align k-means-clustered self-supervised features with i-vectors. We discover, for the first time, that feed-forward neurons implicitly encode speaker gender and broad phoneme categories. Building on this finding, we propose a speaker-relevance-aware structured pruning strategy: retaining highly speaker-correlated neurons while removing low-correlation ones. Experiments demonstrate that, under substantial parameter compression (up to 40% pruning), speaker verification and identification performance remains nearly intact—equal error rate (EER) degradation is less than 0.2%. This confirms that the identified neurons serve as critical carriers of speaker representations. Our work advances understanding of internal representational mechanisms in self-supervised speech models and provides a principled, interpretable approach to model compression.

Technology Category

Application Category

📝 Abstract

In recent years, the impact of self-supervised speech Transformers has extended to speaker-related applications. However, little research has explored how these models encode speaker information. In this work, we address this gap by identifying neurons in the feed-forward layers that are correlated with speaker information. Specifically, we analyze neurons associated with k-means clusters of self-supervised features and i-vectors. Our analysis reveals that these clusters correspond to broad phonetic and gender classes, making them suitable for identifying neurons that represent speakers. By protecting these neurons during pruning, we can significantly preserve performance on speaker-related task, demonstrating their crucial role in encoding speaker information.

Problem

Research questions and friction points this paper is trying to address.

Identify neurons encoding speaker information in Transformers

Analyze clusters for phonetic and gender class correlations

Protect speaker-related neurons during model pruning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyze neurons in feed-forward layers

Use k-means clusters and i-vectors

Protect speaker-related neurons during pruning

🔎 Similar Papers

No similar papers found.

Authors to Follow