The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Med.
Sec. Healthcare Professions Education
Volume 12 - 2025 |
doi: 10.3389/fmed.2025.1495378
This article is part of the Research Topic Insights in Healthcare Professions Education: 2024 View all 12 articles
Artificial intelligence in healthcare education: Evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology
Provisionally accepted- 1 College of Pharmacy, Alfaisal University, Riyadh, Saudi Arabia
- 2 School of Medicine, Case Western Reserve University, Cleveland, Ohio, United States
Background: Artificial intelligence (AI) is revolutionizing medical education; however, its limitations remain underexplored. This study evaluated the accuracy of three generative AI tools-ChatGPT-4, Copilot, and Google Gemini-in answering multiple-choice questions (MCQ) and short-answer questions (SAQ) related to cardiovascular pharmacology, a key subject in healthcare education.Methods: Using free versions of each AI tool, we administered 45 MCQs and 30 SAQs across three difficulty levels: easy, intermediate, and advanced. AI-generated answers were reviewed by three pharmacology experts. The accuracy of MCQ responses was recorded as correct or incorrect, while SAQ responses were rated on a 1-5 scale based on relevance, completeness, and correctness.Results: ChatGPT, Copilot, and Gemini demonstrated high accuracy scores in easy and intermediate MCQs (87-100%). While all AI models showed a decline in performance on the advanced MCQ section, only Copilot (53% accuracy) and Gemini (20% accuracy) had significantly lower scores compared to their performance on easyintermediate levels. SAQ evaluations revealed high accuracy scores for ChatGPT (overall 4.7 ± 0.3) and Copilot (overall 4.5 ± 0.4) across all difficulty levels, with no significant differences between the two tools. In contrast, Gemini's SAQ performance was markedly lower across all levels (overall 3.3 ± 1.0).ChatGPT-4 demonstrates the highest accuracy in addressing both MCQ and SAQ cardiovascular pharmacology questions, regardless of difficulty level. Copilot ranks second after ChatGPT, while Google Gemini shows significant limitations in handling complex MCQs and providing accurate responses to SAQ-type questions in this field. These findings can guide the ongoing refinement of AI tools for specialized medical education.
Keywords: ChatGPT, Copilot, Gemini, cardiovascular pharmacology, pharmacy and medicine, Medical Education
Received: 12 Sep 2024; Accepted: 21 Jan 2025.
Copyright: © 2025 Salman, Ameer, Khanfar and Hsieh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Ibrahim M Salman, College of Pharmacy, Alfaisal University, Riyadh, 11533, Saudi Arabia
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.