Skip to main content

ORIGINAL RESEARCH article

Front. Digit. Health
Sec. Health Informatics
Volume 6 - 2024 | doi: 10.3389/fdgth.2024.1351637
This article is part of the Research Topic Fairness, Interpretability, Explainability and Accountability in Predictive Healthcare View all 3 articles

Deconstructing demographic bias in speech-based machine learning models for digital health

Provisionally accepted
  • 1 Texas A and M University, College Station, Texas, United States
  • 2 Texas A&M University at Qatar, Doha, Qatar
  • 3 University of Colorado Boulder, Boulder, United States

The final, formatted version of the article will be published soon.

    Machine learning (ML) algorithms have been heralded as promising solutions to the realization of assistive systems in digital healthcare, due to their ability to detect fine-grain patterns that are not easily perceived by humans. Yet, ML algorithms have also been critiqued for treating individuals differently based on their demography, thus propagating existing disparities. This paper explores gender and race bias in speech-based ML algorithms that detect behavioral and mental health outcomes. It examines potential sources of bias in the data used to train the ML, encompassing acoustic features extracted from speech signals and associated labels, as well as in the ML decisions. The paper further examines approaches to reduce existing bias via using the features that are the least informative of one's demographic information as the ML input, and transforming the feature space in an adversarial manner to diminish the evidence of the demographic information while retaining information about the focal behavioral and mental health state. Results are presented in two domains, the first pertaining to gender and race bias when estimating levels of anxiety, and the second pertaining to gender bias in detecting depression.Findings indicate the presence of statistically significant differences in both acoustic features and labels among demographic groups, as well as differential ML performance among groups.The statistically significant differences present in the label space are partially preserved in the ML decisions. Although variations in ML performance across demographic groups were noted, results are mixed regarding the models' ability to accurately estimate healthcare outcomes for the sensitive groups. These findings underscore the necessity for careful and thoughtful design in developing ML models that are capable of maintaining crucial aspects of the data and perform effectively across all populations in digital healthcare applications.

    Keywords: Speech, machine learning, Anxiety, Depression, Demographic bias, fairness

    Received: 06 Dec 2023; Accepted: 15 Jul 2024.

    Copyright: © 2024 Yang, El-Attar and Chaspari. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Theodora Chaspari, University of Colorado Boulder, Boulder, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.