Skip to main content

ORIGINAL RESEARCH article

Front. Oncol.
Sec. Genitourinary Oncology
Volume 14 - 2024 | doi: 10.3389/fonc.2024.1457516
This article is part of the Research Topic The Role of AI in GU Oncology View all 3 articles

Empowering Patients: How Accurate and Readable are Large Language Models in Renal Cancer

Provisionally accepted
  • 1 King Abdulaziz University, Jeddah, Saudi Arabia
  • 2 Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
  • 3 Mediclinic City Hospital, Dubai, United Arab Emirates

The final, formatted version of the article will be published soon.

    The incorporation of Artificial Intelligence (AI) into healthcare sector has fundamentally transformed patient care paradigms, particularly through the creation of patient education materials (PEMs) tailored to individual needs.This Study aims to assess the precision and readability AI-generated information on kidney cancer using ChatGPT 4.0, Gemini AI, and Perplexity AI., comparing these outputs to PEMs provided by the American Urological Association (AUA) and the European Association of Urology (EAU). The objective is to guide physicians in directing patients to accurate and understandable resources.PEMs published by AUA and EAU were collected and categorized. kidney cancer-related queries, identified via Google Trends (GT), were input into CahtGPT-4.0, Gemini AI, and Perplexity AI. Four independent reviewers assessed the AI outputs for accuracy grounded on five distinct categories, employing a 5-point Likert scale. A readability evaluation was conducted utilizing established formulas, including Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG), and Flesch-Kincaid Grade Formula (FKGL). AI chatbots were then tasked with simplifying their outputs to achieve a sixth-grade reading level.The PEM published by the AUA was the most readable with a mean readability score of 9.84±1.2, in contrast to EAU (11.88±1.11), ChatGPT-4.0 (11.03±1.76), Perplexity AI (12.66±1.83), and Gemini AI (10.83±2.31). The Chatbots demonstrated the capability to simplify text lower grade levels upon request, with ChatGPT-4.0 achieving a readability grade level ranging from 5.76 to 9.19, Perplexity AI from 7.33 to 8.45, Gemini AI from 6.43 to 8.43. While official PEMS were considered accurate, the LLMs generated outputs exhibited an overall high level of accuracy with minor detail omission and some information inaccuracies. Information related to kidney cancer treatment was found to be the least accurate among the evaluated categories.Although the PEM published by AUA being the most readable, both authoritative PEMs and Large Language Models (LLMs) generated outputs exceeded the recommended readability threshold for general population. AI Chatbots can simplify their outputs when explicitly instructed. However, notwithstanding their accuracy, LLMs-generated outputs are susceptible to detail omission and inaccuracies. The variability in AI performance necessitates cautious use as an adjunctive tool in patient education.

    Keywords: artificial intelligence, Kidney cancer, Patient education materials, Health Literacy, Large language models, accuracy, Readability Abdulghafour Halawani: 0000-0002-2112-9562, Mudhar Hasan: 0000-0001-5136-3176

    Received: 30 Jun 2024; Accepted: 09 Sep 2024.

    Copyright: © 2024 Halawani, Almehmadi, Alhubaishy, Alnefaie and Hasan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Abdulghafour Halawani, King Abdulaziz University, Jeddah, Saudi Arabia

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.