Skip to main content

ORIGINAL RESEARCH article

Front. Oncol.
Sec. Gastrointestinal Cancers: Hepato Pancreatic Biliary Cancers
Volume 14 - 2024 | doi: 10.3389/fonc.2024.1513608

Feasibility of Large Language Models for CEUS LI-RADS Categorization of Small Liver Nodules in Patients at Risk for Hepatocellular Carcinoma

Provisionally accepted
  • 1 West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
  • 2 Affiliated Hospital of Panzhihua University, Panzhihua, Sichuan, China
  • 3 Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, Sichuan Province, China
  • 4 Thomas Jefferson University Hospital, Jefferson University Hospitals, Philadelphia, Pennsylvania, United States
  • 5 West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China

The final, formatted version of the article will be published soon.

    Background: Large language models (LLMs) offer opportunities to enhance radiological applications, but their performance in handling complex tasks remains insufficiently investigated.To evaluate the performance of LLMs integrated with Contrast-enhanced Ultrasound Liver Imaging Reporting and Data System (CEUS LI-RADS) in diagnosing small (≤20mm) hepatocellular carcinoma (sHCC) in high-risk patients.From November 2014 to December 2023, high-risk HCC patients with untreated small (≤20mm) focal liver lesions (sFLLs), were included in this retrospective study.ChatGPT-4.0, ChatGPT-4o, ChatGPT-4o mini, and Google Gemini were integrated with imaging features from structured CEUS LI-RADS reports to assess their diagnostic performance for sHCC. The diagnostic efficacy of LLMs for small HCC were compared usingMcNemar test.The final population consisted of 403 high-risk patients (52 years ± 11, 323 men).ChatGPT-4.0 and ChatGPT-4o demonstrated substantial to almost perfect intra-agreement for CEUS LI-RADS categorization (κ values: 0.76-1.0 and 0.7-0.94, respectively), outperforming ChatGPT-4o mini (κ values: 0.51-0.72) and Google Gemini (κ values: -0.04-0.47). ChatGPT-4.0 had higher sensitivity in detecting sHCC than ChatGPT-4o (83%-89% vs. 70%-78%, p < 0.02) with comparable specificity (76%-90% vs. 83%-86%, p > 0.05). Compared to human readers, ChatGPT-4.0 showed superior sensitivity (83%-89% vs. 63%-78%, p < 0.004) and comparable specificity (76%-90% vs. 90%-95%, p > 0.05) in diagnosing sHCC.for high-risk patients. ChatGPT-4.0 demonstrated satisfactory consistency in CEUS LI-RADS categorization, offering higher sensitivity in diagnosing sHCC while maintaining comparable specificity to that of human readers.

    Keywords: Hepatocellular Carcinoma ( HCC), large language model (LLM), diagnosis, CEUS (Contrast-enhanced ultrasound), ultrasound

    Received: 18 Oct 2024; Accepted: 22 Nov 2024.

    Copyright: © 2024 Huang, Yang, Huang, Zeng, Liu, Luo, Lyshchik and Lu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Andrej Lyshchik, Thomas Jefferson University Hospital, Jefferson University Hospitals, Philadelphia, 19107, Pennsylvania, United States
    Qiang Lu, West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.