The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Digit. Health
Sec. Connected Health
Volume 6 - 2024 |
doi: 10.3389/fdgth.2024.1448351
This article is part of the Research Topic Advancing Vocal Biomarkers and Voice AI in Healthcare: Multidisciplinary Focus on Responsible and Effective Development and Use View all articles
Voice EHR: Introducing Multimodal Audio Data for Health
Provisionally accepted- 1 University of Oxford, Oxford, United Kingdom
- 2 National Institutes of Health (NIH), Bethesda, Maryland, United States
- 3 Oxford University Clinical Research Unit in Vietnam (OUCRU), Hanoi, Vietnam
- 4 University of Tennessee Health Science Center (UTHSC), Memphis, Tennessee, United States
- 5 Johns Hopkins Medicine, Johns Hopkins University, Baltimore, Maryland, United States
- 6 Oxford University Clinical Research Unit Indonesia, Jakarta Pusat, Indonesia
- 7 University of Rwanda, Kigali, Kigali City, Rwanda
- 8 Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States
- 9 University of South Florida, Tampa, Florida, United States
Artificial intelligence (AI) models trained on audio data may have the potential to rapidly perform clinical tasks, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries, which challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact on health equity. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. The app facilitates the collection of an audio electronic health record ("Voice EHR") which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and spoken language with semantic meaning and longitudinal context -potentially compensating for the typical limitations of unimodal clinical datasets. This report presents the application used for data collection, initial experiments on data quality, and case studies which demonstrate the potential of voice EHR to advance the scalability/diversity of audio AI.
Keywords: AI for health, Natural Language Processing, large language model (LLM), Multimodal data, artificial intelligence, Voice biomarkers
Received: 13 Jun 2024; Accepted: 26 Dec 2024.
Copyright: © 2024 Anibal, Huth, Li, Hazen, Lam, Nguyen, Hong, Kleinman, Ost, Jackson, Sprabery, Elangovan, Krishnaiah, Akst, Lina, Elyazar, Ekawati, Jansen, Nduwayezu, Garcia, Plum, Brenner, Song, Ricotta, Clifton, Thwaites, Bensoussan and Wood. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
James Anibal, University of Oxford, Oxford, United Kingdom
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.