Skip to main content

ORIGINAL RESEARCH article

Front. Signal Process.
Sec. Audio and Acoustic Signal Processing
Volume 4 - 2024 | doi: 10.3389/frsip.2024.1432298
This article is part of the Research Topic Informed Acoustic Source Separation and Extraction View all 3 articles

Auditory Attention Decoding Based on Neural-Network for Binaural Beamforming Applications

Provisionally accepted
Roy Gueta Roy Gueta 1Elana Zion-Golumbic Elana Zion-Golumbic 2Jacob Goldberger Jacob Goldberger 1Sharon Gannot Sharon Gannot 1*
  • 1 Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
  • 2 Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat Gan, Tel Aviv District, Israel

The final, formatted version of the article will be published soon.

    Individuals have the remarkable ability to differentiate between speakers and focus on a particular speaker, even amidst complex acoustic environments with multiple speakers, background noise and reverberations. This selective auditory attention, often illustrated by the cocktail party problem, has been extensively researched. With a considerable portion of the population experiencing hearing impairment and requiring hearing aids, there arises a necessity to separate and decode auditory signals artificially. The linearly constrained minimum variance (LCMV) beamforming design criterion has proven effective in isolating the desired source by steering a beam toward the target speaker while creating a null toward the interfering source. Preserving the binaural cues, e.g., interaural time difference (ITD) and interaural level difference (ILD), is a prerequisite for producing a beamformer output suitable for hearing aid applications. For that, the binaural linearly constrained minimum variance (BLCMV) beamformer generates two outputs that satisfy the standard LCMV criterion while preserving the binaural cues between the left-ear and right-ear outputs. Identifying the attended speaker from the separated speakers and distinguishing it from the unattended speaker poses a fundamental challenge in the beamformer design. Several studies showed the ability to encode essential features of the attended speech from the cortex neural response, as recorded by the electroencephalography (EEG) signals. This led to the development of several algorithms addressing the auditory attention decoder (AAD) task. This paper investigates two neural network architectures for the AAD task. The first architecture leverages transfer learning. It is evaluated using both same-trial and cross-trial experiments. The second architecture employs an attention mechanism between the speech signal represented in the short time Fourier transform (STFT) domain and a multi-band filtered EEG signal. With the goal of alleviating the problem of same-trial overfitting, this architecture employs a new data organization structure that presents the neural network (NN) with a single speaker’s speech and the corresponding EEG signal as inputs. Finally, posterior probability post-processing is applied to the outputs of the NN to improve detection accuracy. The experimental study validates the applicability of the proposed scheme as an AAD method. Strategies for incorporating the AAD into BLCMV beamformer are discussed.

    Keywords: Audio attention decoding, EEG signals, Multi-microphone processing, Binaural LCMV beamformer, neural network based AAD

    Received: 13 May 2024; Accepted: 09 Dec 2024.

    Copyright: © 2024 Gueta, Zion-Golumbic, Goldberger and Gannot. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Sharon Gannot, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.