🤖 AI Summary
To address slow convergence, strong dependence on noise priors, and near-end speech distortion in acoustic echo cancellation (AEC) for hands-free communication—particularly during double-talk scenarios—this paper proposes a deep neural network-enhanced frequency-domain adaptive Kalman filter (DNN-FDKF). Unlike conventional FDKF, which requires precise pre-specification of noise covariance matrices, the proposed method integrates a DNN into the state-space modeling framework via joint end-to-end training, enabling direct learning of nonlinear loudspeaker responses and time-varying noise statistics. This is the first systematic comparative study of various DNN-augmented architectures under identical data and training conditions. Results show that DNN-FDKF significantly improves re-convergence speed, achieves an average 2.1 dB gain in echo return loss enhancement (ERLE), and enhances near-end speech fidelity during double-talk (average PESQ-WB improvement of 0.32), all without explicit noise modeling.
📝 Abstract
Kalman filtering is a powerful approach to adaptive filtering for various problems in signal processing. The frequency-domain adaptive Kalman filter (FDKF), based on the concept of the acoustic state space, provides a unifying solution to the adaptive filter update and the related stepsize control. It was conceived for the problem of acoustic echo cancellation and, as such, is frequently applied in hands-free systems. This article motivates and briefly recapitulates the linear FDKF and investigates how it can be further supported by deep neural networks (DNNs) in various ways, specifically to overcome the challenges and limitations related to the usually required estimation of process and observation noise covariances for the Kalman filter. While the mere FDKF comes with very low computational complexity, its neural Kalman filter variants may deliver faster (re)convergence, better echo cancellation, and even exceed the FDKF in its excellent double-talk near-end speech preservation both under linear and nonlinear loudspeaker conditions. To provide a synopsis of the state of the art, this article contributes a comparison of a range of DNN-based extensions of FDKF in the same training framework and using the same data.