Evaluating ChatGPT-4's performance on oral and maxillofacial queries: Chain of thought and standard method

Ji, Kaiyuan; Wu, Zhihan; Han, Jing; Zhai, Guangtao; Liu, Jiannan

doi:10.3389/froh.2025.1541976

ORIGINAL RESEARCH article

Front. Oral. Health

Sec. Oral and Maxillofacial Surgery

Volume 6 - 2025 | doi: 10.3389/froh.2025.1541976

Evaluating ChatGPT-4's performance on oral and maxillofacial queries: Chain of thought and standard method

Provisionally accepted

¹ School of Communication and Electronic Engineering, East China Normal University, Shanghai, China
² Shanghai Ninth People’s Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, Shanghai, China
³ Department of Electronic Engineering, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China

The final, formatted version of the article will be published soon.

Objectives: Oral and maxillofacial diseases affect approximately 3.5 billion people worldwide. With the continuous advancement of artificial intelligence technologies, particularly the application of generative pre-trained transformers like ChatGPT-4, there is potential to enhance public awareness of the prevention and early detection of these diseases. This study evaluated the performance of ChatGPT-4 in addressing oral and maxillofacial disease questions using standard approaches and the Chain of Thought (CoT) method, aiming to gain a deeper understanding of its capabilities, potential, and limitations.Materials and Methods: Three experts, drawing from their extensive experience and the most common questions in clinical settings, selected 130 open-ended questions and 1805 multiple-choice questions from the national oral practice examination. These questions encompass 12 areas of oral and maxillofacial surgery, including Prosthodontics, Pediatric Dentistry, Maxillofacial Tumors and Salivary Gland Diseases, and maxillofacial Infections.Results: Using CoT approach, ChatGPT-4 exhibited marked enhancements in accuracy, structure, completeness, professionalism, and overall impression for open-ended questions, revealing statistically significant differences compared to its performance on general oral and maxillofacial inquiries. In the realm of multiple-choice questions, the application of CoT method boosted ChatGPT-4's accuracy across all major subjects, achieving an overall accuracy increase of 3.1%.Conclusions: When employing ChatGPT-4 to address questions in oral and maxillofacial surgery, incorporating CoT as a querying method can enhance its performance and help the public improve their understanding and awareness of such issues. However, it is not advisable to consider it a substitute for doctors.

Keywords: artificial intelligence, Chain of Thought, Education tool, ChatGPT-4, Oral and maxillofacial

Received: 09 Dec 2024; Accepted: 28 Jan 2025.

Copyright: © 2025 Ji, Wu, Han, Zhai and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Guangtao Zhai, School of Communication and Electronic Engineering, East China Normal University, Shanghai, China
Jiannan Liu, Shanghai Ninth People’s Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, 200025, Shanghai, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.