Objective

AUTHOR=Hong Dao-Rong , Huang Chun-Yan 

TITLE=The performance of AI in medical examinations: an exploration of ChatGPT in ultrasound medical education

JOURNAL=Frontiers in Medicine

VOLUME=Volume 11 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2024.1472006

DOI=10.3389/fmed.2024.1472006

ISSN=2296-858X

ABSTRACT=<sec id="sec1"><title>Objective</title><p>This study aims to evaluate the accuracy of ChatGPT in the context of China’s Intermediate Professional Technical Qualification Examination for Ultrasound Medicine, exploring its potential role in ultrasound medical education.</p></sec><sec id="sec2"><title>Methods</title><p>A total of 100 questions, comprising 70 single-choice and 30 multiple-choice questions, were selected from the examination’s question bank. These questions were categorized into four groups: basic knowledge, relevant clinical knowledge, professional knowledge, and professional practice. ChatGPT versions 3.5 and 4.0 were tested, and accuracy was measured based on the proportion of correct answers for each version.</p></sec><sec id="sec3"><title>Results</title><p>ChatGPT 3.5 achieved an accuracy of 35.7% for single-choice and 30.0% for multiple-choice questions, while version 4.0 improved to 61.4 and 50.0%, respectively. Both versions performed better in basic knowledge questions but showed limitations in professional practice-related questions. Version 4.0 demonstrated significant improvements across all categories compared to version 3.5, but it still underperformed when compared to resident doctors in certain areas.</p></sec><sec id="sec4"><title>Conclusion</title><p>While ChatGPT did not meet the passing criteria for the Intermediate Professional Technical Qualification Examination in Ultrasound Medicine, its strong performance in basic medical knowledge suggests potential as a supplementary tool in medical education. However, its limitations in addressing professional practice tasks need to be addressed.</p></sec>