In Silico Modeling of the RAMPHO Buffer: Dissociating Informational and Energetic Masking via Phonetic Entropy in Deep Neural Networks

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This study addresses a critical limitation in existing speech enhancement methods, which optimize only acoustic metrics while neglecting the cognitive bottleneck induced by informational masking in multi-talker environments. The work proposes a novel cognitive-inspired paradigm that, for the first time, disentangles the cognitive costs of informational and energetic masking within a deep neural network framework. It introduces a silicon-analog RAMPHO memory buffer mechanism based on frame-level phoneme entropy derived from wav2vec 2.0, integrated with SNR scanning and Concentration Shield phase decorrelation to differentiate the auditory cognitive impacts of semantically coherent interference versus phase distortion. The findings reveal a Pareto trade-off: semantic disruption alleviates informational masking at high SNRs but impairs temporal cues at low SNRs, thereby demonstrating the necessity of joint cognitive-acoustic optimization for next-generation speech enhancement systems.
📝 Abstract
The fundamental challenge of listening in multi-talker environments is a cognitive bottleneck, defined by the Ease of Language Understanding (ELU) model as a failure within the RAMPHO episodic buffer. Current deep neural networks for speech enhancement optimize purely for physical acoustics, failing to account for the cognitive penalty of informational masking. Here, we present an in silico simulation of the RAMPHO buffer using the frame-by-frame phonetic entropy of a self-supervised acoustic model (wav2vec 2.0). By contrasting a semantically intact distractor with a phase-decorrelated distractor (the Concentration Shield) across a signal-to-noise ratio (SNR) sweep, we successfully dissociate the cognitive penalty of informational distraction from the physical penalty of energetic decay. The simulation reveals a cognitive-acoustic Pareto optimization problem: destroying a distractor's semantic payload provides a release from informational masking at high SNRs, but fundamentally degrades temporal glimpsing cues at low SNRs.
Problem

Research questions and friction points this paper is trying to address.

informational masking
RAMPHO buffer
speech enhancement
cognitive bottleneck
multi-talker environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

informational masking
phonetic entropy
RAMPHO buffer
self-supervised learning
cognitive-acoustic Pareto optimization