AUTHOR=Kang Yuyong , Zheng Nengheng , Meng Qinglin 

TITLE=Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants

JOURNAL=Frontiers in Medicine

VOLUME=Volume 8 - 2021

YEAR=2021

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2021.740123

DOI=10.3389/fmed.2021.740123

ISSN=2296-858X

ABSTRACT=The cochlea plays a key role in the transmission from acoustic vibration to neural stimulation upon which the brain perceives the sound. A cochlear implant (CI) is an auditory prosthesis to replace the damaged cochlear hair cells to achieve acoustic-to-neural conversion. However, the CI is a very coarse bionic imitation of the normal cochlea. The highly resolved time-frequency-intensity information transmitted by the normal cochlea, which is vital to high-quality auditory perception such as speech perception in challenging environments, cannot be guaranteed by CIs. Although CI recipients with state-of-the-art commercial CI products enjoy their speech communication in quiet backgrounds, they usually suffer from poor hearing ability in noisy environments. Thereby, noise suppression or speech enhancement (SE) is one of the most urgent technologies for CI. In this paper, we introduce recent progress in deep learning (DL), mostly neural networks (NN)-based SE frontends to CI and discuss how the hearing properties of the CI recipients could be utilized to optimize the NN-based SE. In particular, different loss functions are introduced to supervise the NN training, and a set of objective and subjective experiments are presented. Results verify that the CI recipients are more sensitive to the residual noise than the SE-induced speech distortion, which has been common knowledge to CI society. Furthermore, we demonstrate that the intelligibility of the denoised speech can be significantly improved by using a loss function with SNR-dependent attention on speech distortion and residual noise and, generally, a loss function bias to more noise suppression is preferred in NN-based SE for CIs.