Skip to main content

ORIGINAL RESEARCH article

Front. Digit. Health
Sec. Connected Health
Volume 6 - 2024 | doi: 10.3389/fdgth.2024.1448351
This article is part of the Research Topic Advancing Vocal Biomarkers and Voice AI in Healthcare: Multidisciplinary Focus on Responsible and Effective Development and Use View all articles

Voice EHR: Introducing Multimodal Audio Data for Health

Provisionally accepted
James Anibal James Anibal 1,2*Hannah Huth Hannah Huth 2Ming Li Ming Li 2Lindsey Hazen Lindsey Hazen 2Yen Lam Yen Lam 3Hang Nguyen Hang Nguyen 3Phuc Hong Phuc Hong 3Michael Kleinman Michael Kleinman 4Shelley Ost Shelley Ost 4Christopher Jackson Christopher Jackson 4Laura Sprabery Laura Sprabery 4Cheran Elangovan Cheran Elangovan 4Balaji Krishnaiah Balaji Krishnaiah 4Lee Akst Lee Akst 5Ioan Lina Ioan Lina 5Iqbal Elyazar Iqbal Elyazar 6Lenny Ekawati Lenny Ekawati 6Stefan Jansen Stefan Jansen 7Richard Nduwayezu Richard Nduwayezu 7Charisse Garcia Charisse Garcia 2Jeffrey Plum Jeffrey Plum 2Jacqueline Brenner Jacqueline Brenner 2Miranda Song Miranda Song 2Emily Ricotta Emily Ricotta 8David Clifton David Clifton 1Louise Thwaites Louise Thwaites 3Yael Bensoussan Yael Bensoussan 9Bradford Wood Bradford Wood 2
  • 1 University of Oxford, Oxford, United Kingdom
  • 2 National Institutes of Health (NIH), Bethesda, Maryland, United States
  • 3 Oxford University Clinical Research Unit in Vietnam (OUCRU), Hanoi, Vietnam
  • 4 University of Tennessee Health Science Center (UTHSC), Memphis, Tennessee, United States
  • 5 Johns Hopkins Medicine, Johns Hopkins University, Baltimore, Maryland, United States
  • 6 Oxford University Clinical Research Unit Indonesia, Jakarta Pusat, Indonesia
  • 7 University of Rwanda, Kigali, Kigali City, Rwanda
  • 8 Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States
  • 9 University of South Florida, Tampa, Florida, United States

The final, formatted version of the article will be published soon.

    Artificial intelligence (AI) models trained on audio data may have the potential to rapidly perform clinical tasks, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries, which challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact on health equity. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. The app facilitates the collection of an audio electronic health record ("Voice EHR") which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and spoken language with semantic meaning and longitudinal context -potentially compensating for the typical limitations of unimodal clinical datasets. This report presents the application used for data collection, initial experiments on data quality, and case studies which demonstrate the potential of voice EHR to advance the scalability/diversity of audio AI.

    Keywords: AI for health, Natural Language Processing, large language model (LLM), Multimodal data, artificial intelligence, Voice biomarkers

    Received: 13 Jun 2024; Accepted: 26 Dec 2024.

    Copyright: © 2024 Anibal, Huth, Li, Hazen, Lam, Nguyen, Hong, Kleinman, Ost, Jackson, Sprabery, Elangovan, Krishnaiah, Akst, Lina, Elyazar, Ekawati, Jansen, Nduwayezu, Garcia, Plum, Brenner, Song, Ricotta, Clifton, Thwaites, Bensoussan and Wood. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: James Anibal, University of Oxford, Oxford, United Kingdom

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.