ORIGINAL RESEARCH article
Front. Med.
Sec. Healthcare Professions Education
Volume 12 - 2025 | doi: 10.3389/fmed.2025.1545730
This article is part of the Research TopicLarge Language Models for Medical ApplicationsView all 15 articles
Comparison of Medical History Documentation Efficiency and Quality Based on GPT-4o: A Study on the Comparison Between Residents and Artificial Intelligence
Provisionally accepted- 1Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- 2WORK Medical Technology Group LTD, Hangzhou, Jiangsu Province, China
- 3Shanghai Resident Sandardized Training Center, Shanghai, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: As medical technology advances, physicians' responsibilities in clinical practice continue to increase, with medical history documentation becoming an essential component. Artificial Intelligence (AI) technologies, particularly advances in Natural Language Processing (NLP), have introduced new possibilities for medical documentation. This study aims to evaluate the efficiency and quality of medical history documentation by ChatGPT-4o compared to resident physicians and explore the potential applications of AI in clinical documentation.Methods: Using a non-inferiority design, this study compared the documentation time and quality scores between 5 resident physicians from the hematology department (with an average of 2.4 years of clinical experience) and ChatGPT-4o based on identical case materials. Medical history quality was evaluated by two attending physicians with over 10 years of clinical experience using ten case content criteria. Data were analyzed using paired t-tests and Wilcoxon signed-rank tests, with Kappa coefficients used to assess scoring consistency. Detailed scoring criteria included completeness (coverage of history elements), accuracy (correctness of information), logic (organization and coherence of content), and professionalism (appropriate use of medical terminology and format), each rated on a 10-point scale.Results: In terms of medical history quality, ChatGPT-4o achieved an average score of 88.9, while resident physicians scored 89.6, with no statistically significant difference between the two (p=0.25). The Kappa coefficient between the two evaluators was 0.82, indicating good consistency in scoring. Non-inferiority testing showed that ChatGPT-4o's quality scores fell within the preset non-inferiority margin (5 points), indicating that its documentation quality was not inferior to that of resident physicians. ChatGPT-4o's average documentation time was 40.1 seconds, significantly shorter than the resident physicians' average of 14.9 minutes (p<0.001).ChatGPT-4o significantly reduced the time required for medical history between multiple resident physicians and ChatGPT-4o based on identical case materials.using ten case content criteria.documentation. Despite these positive results, practical considerations such as data preprocessing, data security, and privacy protection must be addressed in real-world applications. Future research should further explore ChatGPT-4o's capabilities in handling complex cases and its applicability across different clinical settings.
Keywords: artificial intelligence, GPT-4o, Medical History Documentation, quality, Efficiency
Received: 15 Dec 2024; Accepted: 24 Apr 2025.
Copyright: © 2025 Xiaoyang, Lu, Wang, Gong, Cheng, Hu, Wu, Wang and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Li Xiaoyang, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.