Skip to main content

ORIGINAL RESEARCH article

Front. Med.
Sec. Ophthalmology
Volume 12 - 2025 | doi: 10.3389/fmed.2025.1546706
This article is part of the Research Topic New Concepts, Advances, and Future Trends in Clinical Research on Eye Diseases View all 17 articles

Evaluating the Performance of ChatGPT on Patient Consultation and Image-based Preliminary Diagnosis in Thyroid Eye Disease

Provisionally accepted
  • 1 Shanghai Changzheng Hospital, Huangpu, Shanghai Municipality, China
  • 2 Second Military Medical University, Shanghai, China
  • 3 Department of Urology, Beijing Tongren Hospital, Capital Medical University, Beijing, China

The final, formatted version of the article will be published soon.

    The emergence of Large Language Model Chatbots, such as ChatGPT, presents great promise for enhancing healthcare practices. Online consultation, accurate pre-diagnosis, and clinical efforts are fundamental to the patient-oriented management system.Objective: This cross-sectional study aims to evaluate the performance of ChatGPT under inquiries across ophthalmic domains, focusing on Thyroid Eye Disease (TED) consultation and image-based preliminary diagnosis in a non-English language.Methods: We sourced the frequently consulted clinical questions from a published reference based on patient consultation data, A Comprehensive Collection of Thyroid Eye Disease Knowledge, and collected facial and Computed Tomography (CT) images from 16 patients with a definitive diagnosis as TED. Inquiries on TED consultation and preliminary diagnosis were posed to ChatGPT from May 18 to 30, 2024, using a new chat for each question. Responses to questions from ChatGPT-4, 4o, along with those from an experienced ocular professor were compiled into three questionnaires participated by patients and ophthalmologists, evaluating the quality in four dimensions: accuracy, comprehensiveness, conciseness, and satisfaction. The preliminary diagnosis of TED was defined as accurate and differences in the accuracy rate were further calculated.Results: For common consultation questions about TED, ChatGPT-4o performed better in delivering a substantial amount of accurate information with logical consistency, adhering to a structured format of disease definition, detailed sections, and summarized conclusions. Notably, the answers generated by ChatGPT-4o were rated high, surpassing that of ChatGPT-4 and the professor, with accuracy (4.33 [0.69]), comprehensiveness (4.17 [0.75]), conciseness (4.12 [0.77]), and satisfaction (4.28 [0.70]). The evaluators' characteristics, answer variables, and other quality scores correlated with overall satisfaction scores. Based on several facial images, ChatGPT-4 twice failed to conduct diagnoses because of lacking characteristic symptoms or complete medical history, while ChatGPT-4o accurately identified the pathologic conditions in 31.25% of cases (95% CI: 11.02%-58.66%). Furthermore, in combination with CT images, ChatGPT-4o performed comparatively to the professor regarding diagnosis accuracy (87.5%, 95% CI 61.65%-98.45%).ChatGPT-4o excelled in comprehensive and satisfactory patient consultation and imaging interpretation, presenting the potential to increase clinical practice efficiency. However, limitations regarding disinformation management and legal permissions remain significant concerns, requiring further exploration in clinical practice.

    Keywords: Thyroid eye disease, Large Language Model, ChatGPT, Virtual healthcare, clinical practice

    Received: 30 Dec 2024; Accepted: 27 Jan 2025.

    Copyright: © 2025 Chen, Wang, Yang, Zeng, Xie, Shen, Jian, Huang and Wei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Yuqing Chen, Shanghai Changzheng Hospital, Huangpu, Shanghai Municipality, China
    Xiao Huang, Shanghai Changzheng Hospital, Huangpu, Shanghai Municipality, China
    Ruili Wei, Shanghai Changzheng Hospital, Huangpu, Shanghai Municipality, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.