Skip to main content

BRIEF RESEARCH REPORT article

Front. Med.
Sec. Pathology
Volume 11 - 2024 | doi: 10.3389/fmed.2024.1402457
This article is part of the Research Topic Advances in Deep Learning-Based Computational Pathology to Address Data Scarcity, Heterogeneity and Integration View all articles

Large language model answers medical questions about standard pathology reports

Provisionally accepted
Anqi Wang Anqi Wang 1Jieli Zhou Jieli Zhou 2Zhang Peng Zhang Peng 1*Haotian Cao Haotian Cao 1*Hongyi Xin Hongyi Xin 2Xinyun Xu Xinyun Xu 1*Haiyang Zhou Haiyang Zhou 1*
  • 1 Shanghai Changzheng Hospital, Huangpu, China
  • 2 shanghai jiaotong university, Shanghai, China

The final, formatted version of the article will be published soon.

    This study aims to evaluate the feasibility of large language model (LLM) in answering pathology questions based on pathology reports (PRs) of colorectal cancer (CRC). Four common questions (CQs) and corresponding answers about pathology were retrieved from public webpages. These questions were input as prompts for Chat Generative Pretrained Transformer (ChatGPT) (gpt-3.5-turbo). The quality indicators (understanding, scientificity, satisfaction) of all answers were evaluated by gastroenterologists. Standard PRs from 5 CRC patients who received radical surgeries in Shanghai Changzheng Hospital were selected. Six report questions (RQs) and corresponding answers were generated by a gastroenterologist and a pathologist. We developed an interactive PRs interpretation system which allows users to upload standard PRs as JPG images. Then the ChatGPT’s responses to the RQs were generated. The quality indicators of all answers were evaluated by gastroenterologists and out-patients. As for CQs, gastroenterologists rated AI answers similarly to non-AI answers in understanding, scientificity, and satisfaction. As for RQ1-3, gastroenterologists and patients rated the AI mean scores higher than non-AI scores among the quality indicators. However, as for RQ4-6, gastroenterologists rated the AI mean scores lower than non-AI scores in understanding and satisfaction. In RQ4, gastroenterologists rated the AI scores lower than non-AI scores in scientificity (P = 0.011); patients rated the AI scores lower than non-AI scores in understanding (P = 0.004) and satisfaction (P = 0.011). In conclusion, LLM could generate credible answers to common pathology questions and conceptual questions on the PRs. It holds great potential in improving doctor-patient communication.

    Keywords: Large Language Model, Medical question, Pathology report, Colorectal, Generative Pre-trained Transformer

    Received: 17 Mar 2024; Accepted: 28 Aug 2024.

    Copyright: © 2024 Wang, Zhou, Peng, Cao, Xin, Xu and Zhou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Zhang Peng, Shanghai Changzheng Hospital, Huangpu, China
    Haotian Cao, Shanghai Changzheng Hospital, Huangpu, China
    Xinyun Xu, Shanghai Changzheng Hospital, Huangpu, China
    Haiyang Zhou, Shanghai Changzheng Hospital, Huangpu, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.