The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 7 - 2024 |
doi: 10.3389/frai.2024.1477535
Reader's Digest Version of Scientific Writing: Comparative Evaluation of Summarization Capacity Between Large Language Models and Medical Students in Analyzing Scientific Writing in Sleep Medicine
Provisionally accepted- California University of Science and Medicine, San Bernardino, United States
As artificial intelligence systems like large language models (LLM) and natural language processing advance, the need to evaluate their utility within medicine and medical education grows. As medical research publications continue to grow exponentially, AI systems offer valuable opportunities to condense and synthesize information, especially in underrepresented areas such as Sleep Medicine. The present study aims to compare summarization capacity between LLM generated summaries of sleep medicine research article abstracts, to summaries generated by Medical Student (humans) and to evaluate if the research content, and literary readability summarized is retained comparably. A collection of three AI-generated and human-generated summaries of sleep medicine research article abstracts were shared with nineteen study participants (medical students) attending a sleep medicine conference. Participants were blind as to which summary was human or LLM generated. After reading both human and AI-generated research summaries participants completed a 1-5 Likert scale survey on the readability of the extracted writings. Participants also answered article-specific multiple-choice questions evaluating their comprehension of the summaries, as a representation of the quality of content retained by the AI-generated summaries. An independent sample T-test between the AI-generated and humangenerated summaries comprehension by study participants revealed no significant difference between the Likert readability ratings (p = 0.702). A chi-squared test of proportions revealed no significant association (χ2 = 1.485, p = 0.223), and a McNemar test revealed no significant association between summary type and the proportion of correct responses to the comprehension multiple choice questions (p = 0.289). Some of the limitations in our study was small number of participants, user bias as study participants were attendees at a sleep conference and were presented summaries from sleep medicine journals. Lastly the summaries did not include graphs, numbers, and pictures, hence were limited in material extraction. While the present analysis did not demonstrate a significant difference among the readability and content quality between the AI and human-generated summaries, limitations in the present study indicate that more research is needed to objectively measure, and further define strengths and weaknesses of AI models in condensing medical literature into efficient and accurate summaries.
Keywords: Sleep medicine, scientific writing, artificial intelligence, Natural Language Processing, Large language models, Medical Education, Medical students
Received: 08 Aug 2024; Accepted: 28 Nov 2024.
Copyright: © 2024 Matalon, Spurzem, Ahsan, White, Kothari and Varma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
August Spurzem, California University of Science and Medicine, San Bernardino, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.