Skip to main content

ORIGINAL RESEARCH article

Front. Med.
Sec. Precision Medicine
Volume 11 - 2024 | doi: 10.3389/fmed.2024.1496866
This article is part of the Research Topic Large Language Models for Medical Applications View all 8 articles

Neurological history both twinned and queried by generative artificial intelligence

Provisionally accepted
  • 1 Downstate Health Sciences University, Brooklyn, United States
  • 2 Kings County Hospital Center, Brooklyn, New York, United States
  • 3 Maimonides Medical Center, New York, New York, United States
  • 4 Lincoln Medical Center, New York, New York, United States
  • 5 Yale University, New Haven, Connecticut, United States

The final, formatted version of the article will be published soon.

    Background and Objectives We propose the use of GPT-4 to facilitate initial history-taking in neurology and other medical specialties. A large language model (LLM) could be utilized as a digital twin which could enhance queryable electronic medical record (EMR) systems and provide healthcare conversational agents (HCAs) to replace waiting-room questionnaires. Methods In this observational pilot study, We presented verbatim history of present illness (HPI) narratives from published case reports of headache, stroke, and neurodegenerative diseases. Three standard GPT-4 models were designated Models P: patient digital twin; N: neurologist to query Model P; and S: supervisor to synthesize the N-P dialogue into a derived HPI and formulate the differential diagnosis. Given the random variability of GPT-4 output, each case was presented five separate times to check consistency and reliability. Results The study achieved an overall HPI content retrieval accuracy of 81%, with accuracies of 84% for headache, 82% for stroke, and 77% for neurodegenerative diseases. Retrieval accuracies for individual HPI components were as follows: 93% for chief complaints, 47% for associated symptoms and review of systems, 76% for relevant symptom details, and 94% for histories of past medical, surgical, allergies, social, and family factors. The ranking of case diagnoses in the differential diagnosis list averaged in the 89th percentile. Discussion Our tripartite LLM model demonstrated accuracy in extracting essential information from published case reports. Further validation with EMR HPIs, and then with direct patient care will be needed to move towards adaptation of enhanced diagnostic digital twins that incorporate real-time data from health-monitoring devices and self-monitoring assessments.

    Keywords: Neurology - Clinical, Stroke, Headache, large language model (LLM), Neurodegenarative disease, History taking

    Received: 15 Sep 2024; Accepted: 30 Dec 2024.

    Copyright: © 2024 Lee, Choi, Angulo, McDougal and Lytton. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Jung-Hyun Lee, Downstate Health Sciences University, Brooklyn, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.