Skip to main content

OPINION article

Front. Surg.

Sec. Orthopedic Surgery

Volume 12 - 2025 | doi: 10.3389/fsurg.2025.1524396

ChatGPT4o's Theranostic Performance in the Management of Thoracolumbar Spine FracturesIntroduction

Provisionally accepted
Xuehai Jia Xuehai Jia 1Litai Ma Litai Ma 1,2*Yi Yang Yi Yang 1,2*Yi Deng Yi Deng 1*Chang Yong Shen Chang Yong Shen 1Kerui Zhang Kerui Zhang 1*Ya Li Ya Li 3*
  • 1 Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China, Chengdu, Sichuan Province, China
  • 2 West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China
  • 3 First Affiliated Hospital, School of Medicine, Shihezi University, Shihezi, Xinjiang Uyghur Region, China

The final, formatted version of the article will be published soon.

    ChatGPT, developed by OpenAI (https://chat.openai.com), is a publicly accessible tool that utilizes advanced machine learning algorithms to process and analyze extensive data, generating responses to user inquiries. On May 13, 2024, OpenAI launched the ChatGPT4o model, which, according to information on the OpenAI website, represents the latest, fastest, and most advanced version. This model supports a context length of up to 128k tokens (equivalent to the length of a long novel) and offers multimodal capabilities, including text and image inputs, as well as text, image, and audio outputs (https://help.openai.com). While numerous studies have explored ChatGPT's potential applications and challenges in the biomedical field[1, 2], limited research has been conducted on the specific capabilities of ChatGPT4o in the medical domain. A REVIEW article[3] published in Frontiers in Surgery mentions that ChatGPT lacks sufficient expertise and background understanding in specialized fields. However, the application of ChatGPT4o may have the potential to change this situation. To validate this model, we investigate the theranostic performance of ChatGPT4o in managing thoracolumbar spine fractures to assess its potential effectiveness and applications in clinical practice.MethodFor our evaluation, we formulated 38 clinical questions based on the diagnostic, treatment, and management guidelines for thoracolumbar fractures established by the Congress of Neurological Surgeons (CNS) [4–14] and the Chinese Medical Association (CMA) [15]. We input all 38 questions into ChatGPT-4o (OpenAI, accessed November 3, 2024) without providing additional context or guidelines. Each question was posed once, and the initial generated response was recorded. To minimize variability, no iterative refinement of prompts was performed. The responses were anonymized and compiled in Supplementary Material S1. Each response was subsequently reviewed by three independent spine surgery experts, who evaluated the responses according to both the established guidelines and their own clinical experience. Each expert used a five-point Likert scale to rate the responses: 1 indicating completely incorrect; 2 more incorrect than correct; 3 an equal mix of correct and incorrect; 4 more correct than incorrect; and 5 completely correct. The median score from the three experts was used as the final rating to minimize bias.ResultWhen ChatGPT4o was presented with "yes or no" questions, it typically responded with comprehensive diagnostic criteria and therapeutic principles rather than a simple "yes" or "no." According to our results (Table 1), 0 responses (0%) received a score of 1, 1 response (2.63%) received a score of 2, 1 response (2.63%) scored a 3, 8 responses (21.05%) scored a 4, and 28 responses (73.68%) scored a 5. Approximately 94.7% of the responses were largely or entirely accurate. DiscussionWhen asked, “Does the choice of surgical approach (anterior, posterior, or combined anterior-posterior) improve clinical outcomes in patients with thoracic and lumbar fractures?”, ChatGPT4o provided an affirmative answer along with detailed explanations. However, according to CNS guidelines, for patients with burst fractures of the thoracolumbar spine, surgeons may use an anterior, posterior, or combined approach, as the choice of approach does not significantly affect clinical or neurological outcomes, a Grade B recommendation. Although ChatGPT4o provided a detailed explanation of the indications for each approach, the experts noted that while the response was generally accurate, the final conclusion was not entirely consistent with guideline recommendations. Furthermore, while ChatGPT4o appears capable of conducting targeted searches on open websites, its “independent reasoning” abilities require further refinement.In summary, ChatGPT4o demonstrates promising performance in diagnosing and treating thoracolumbar trauma. Its ability to search open websites and provide detailed responses could be a useful reference for clinical practitioners. However, ChatGPT4o does not consistently provide fully accurate answers, particularly with "yes or no" questions. Its dependence on specific sources for data retrieval may introduce biases that limit its broader application in the field of spine surgery. ChatGPT requires substantial medical data for further training to enhance model performance. Moreover, given the specific ethical considerations in medicine, ChatGPT4o’s use in clinical settings must ensure patient safety, data privacy, ethical standards, and adherence to relevant “AI regulations”. Although ChatGPT4o’s responses may improve clinical efficiency, it should only serve as a clinical assistant, with spine surgeons validating the accuracy of its information.This study has several methodological limitations: firstly, the lack of comparative analyses with established AI systems (e.g., Google Med-PaLM, IBM Watson) or traditional decision-support tools hinders definitive performance benchmarking; secondly, simulated testing environments may overestimate system efficacy, as diagnostic performance degradation in real-world clinical settings requires urgent empirical validation; finally, the rapid evolution of AI technology necessitates dynamically updated training databases and ethical evaluation frameworks. To address these gaps, subsequent research will incorporate the Partial Credit Model (PCM) and Item Response Theory (IRT) through latent trait modeling, systematically quantifying AI response difficulty levels, refining multidimensional scoring criteria, and strengthening clinical applicability assessments to establish a psychometrically-based evaluation framework. This methodological advancement will enhance the granular understanding of AI's role in complex medical decision-making (e.g., surgical approach selection, prognostic stratification). Future research priorities include: (1) comparative effectiveness studies across AI systems, (2) real-world clinical validation of performance, and (3) development of specialty-specific human-AI collaboration guidelines to systematically improve the clinical utility of intelligent assistive tools in spinal surgery.

    Keywords: ChatGPT4o, Thoracolumbar spine fractures, Theranostic Performance, clinical practice, AI in medicine

    Received: 07 Nov 2024; Accepted: 12 Feb 2025.

    Copyright: © 2025 Jia, Ma, Yang, Deng, Shen, Zhang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Litai Ma, West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China
    Yi Yang, West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China
    Yi Deng, Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China, Chengdu, Sichuan Province, China
    Kerui Zhang, Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China, Chengdu, Sichuan Province, China
    Ya Li, First Affiliated Hospital, School of Medicine, Shihezi University, Shihezi, 832008, Xinjiang Uyghur Region, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

    Research integrity at Frontiers

    Man ultramarathon runner in the mountains he trains at sunset

    94% of researchers rate our articles as excellent or good

    Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


    Find out more