With the development of data-driven tools, including machine learning, audio signal processing has attained a high level of accuracy, either for speech recognition or emotion analysis.
Both issues necessitate creating a model for the signal’s source. Whereas this requirement has been solved for speech signals, through the repartition of the recognition problem into an acoustic model, a language model, and optionally a pronunciation model, it is still problematic to model emotion change detection in speech signals. This peculiar aspect could be very interesting for mental health disorder diagnosis, since it might bring useful information about psychological signs, such as sudden mood change and other mental health issues.
However, the main issue in speech emotion change detection is related to the way change/anomaly detection is performed. More particularly, how to deal with the raw speech data, that is collected
from diverse sources, such as direct recording of speech uttered by patients, or from telephone conversation with a psychologist. Furthermore, other problems emerge from the very nature of audio sources, such as (a) the multitude of possible sources, (b) the necessity to have enough data for each category of emotion, and (c) especially the problem of uncertainty in the collected audio
data.
In the light of these considerations, machine learning seems to be a reliable solution to deal with such issues. In fact, speech emotion recognition has always been based on data-driven models, such as Hidden Markov Models (HMM), deep neural networks (DNN), and more recently moving towards end-to-end or on-the-fly models, whereas emotion change/anomaly detection is still awaiting a well-
defined machine learning-based framework to deal with the issues mentioned above. Such a framework should be able to model the audio source and meet the goal of the emotion change/anomaly detection process.
This call proposes to present contributions interested in different sides of the posed problem, including (but not limited to):
Developing methods based on machine learning for speech emotion recognition,
Change/anomaly detection methods for emotional speech signals,
Dedicated speech corpora and audio datasets and/or feature extraction methods,
Modeling uncertainty for speech emotion recognition and/or emotion change/anomaly detection.
Keywords:
Machine learning, speech emotion recognition, emotion change/anomaly detection, neural networks, audio datasets, expressive speech corpus, feature extraction, modeling uncertainty
Important Note:
All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.
With the development of data-driven tools, including machine learning, audio signal processing has attained a high level of accuracy, either for speech recognition or emotion analysis.
Both issues necessitate creating a model for the signal’s source. Whereas this requirement has been solved for speech signals, through the repartition of the recognition problem into an acoustic model, a language model, and optionally a pronunciation model, it is still problematic to model emotion change detection in speech signals. This peculiar aspect could be very interesting for mental health disorder diagnosis, since it might bring useful information about psychological signs, such as sudden mood change and other mental health issues.
However, the main issue in speech emotion change detection is related to the way change/anomaly detection is performed. More particularly, how to deal with the raw speech data, that is collected
from diverse sources, such as direct recording of speech uttered by patients, or from telephone conversation with a psychologist. Furthermore, other problems emerge from the very nature of audio sources, such as (a) the multitude of possible sources, (b) the necessity to have enough data for each category of emotion, and (c) especially the problem of uncertainty in the collected audio
data.
In the light of these considerations, machine learning seems to be a reliable solution to deal with such issues. In fact, speech emotion recognition has always been based on data-driven models, such as Hidden Markov Models (HMM), deep neural networks (DNN), and more recently moving towards end-to-end or on-the-fly models, whereas emotion change/anomaly detection is still awaiting a well-
defined machine learning-based framework to deal with the issues mentioned above. Such a framework should be able to model the audio source and meet the goal of the emotion change/anomaly detection process.
This call proposes to present contributions interested in different sides of the posed problem, including (but not limited to):
Developing methods based on machine learning for speech emotion recognition,
Change/anomaly detection methods for emotional speech signals,
Dedicated speech corpora and audio datasets and/or feature extraction methods,
Modeling uncertainty for speech emotion recognition and/or emotion change/anomaly detection.
Keywords:
Machine learning, speech emotion recognition, emotion change/anomaly detection, neural networks, audio datasets, expressive speech corpus, feature extraction, modeling uncertainty
Important Note:
All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.