Large language model answers medical questions about standard pathology reports

Wang, Anqi; Zhou, Jieli; Peng, Zhang; Cao, Haotian; Xin, Hongyi; Xu, Xinyun; Zhou, Haiyang

doi:10.3389/fmed.2024.1402457

BRIEF RESEARCH REPORT article

Front. Med.

Sec. Pathology

Volume 11 - 2024 | doi: 10.3389/fmed.2024.1402457

This article is part of the Research Topic Advances in Deep Learning-Based Computational Pathology to Address Data Scarcity, Heterogeneity and Integration View all articles

Large language model answers medical questions about standard pathology reports

Provisionally accepted

Anqi Wang ¹

Jieli Zhou ²

Zhang Peng ^1*

Haotian Cao ^1*

Hongyi Xin ²

Xinyun Xu ^1*

Haiyang Zhou ^1*

¹ Shanghai Changzheng Hospital, Huangpu, China
² shanghai jiaotong university, Shanghai, China

The final, formatted version of the article will be published soon.

This study aims to evaluate the feasibility of large language model (LLM) in answering pathology questions based on pathology reports (PRs) of colorectal cancer (CRC). Four common questions (CQs) and corresponding answers about pathology were retrieved from public webpages. These questions were input as prompts for Chat Generative Pretrained Transformer (ChatGPT) (gpt-3.5-turbo). The quality indicators (understanding, scientificity, satisfaction) of all answers were evaluated by gastroenterologists. Standard PRs from 5 CRC patients who received radical surgeries in Shanghai Changzheng Hospital were selected. Six report questions (RQs) and corresponding answers were generated by a gastroenterologist and a pathologist. We developed an interactive PRs interpretation system which allows users to upload standard PRs as JPG images. Then the ChatGPT’s responses to the RQs were generated. The quality indicators of all answers were evaluated by gastroenterologists and out-patients. As for CQs, gastroenterologists rated AI answers similarly to non-AI answers in understanding, scientificity, and satisfaction. As for RQ1-3, gastroenterologists and patients rated the AI mean scores higher than non-AI scores among the quality indicators. However, as for RQ4-6, gastroenterologists rated the AI mean scores lower than non-AI scores in understanding and satisfaction. In RQ4, gastroenterologists rated the AI scores lower than non-AI scores in scientificity (P = 0.011); patients rated the AI scores lower than non-AI scores in understanding (P = 0.004) and satisfaction (P = 0.011). In conclusion, LLM could generate credible answers to common pathology questions and conceptual questions on the PRs. It holds great potential in improving doctor-patient communication.

Keywords: Large Language Model, Medical question, Pathology report, Colorectal, Generative Pre-trained Transformer

Received: 17 Mar 2024; Accepted: 28 Aug 2024.

Copyright: © 2024 Wang, Zhou, Peng, Cao, Xin, Xu and Zhou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Zhang Peng, Shanghai Changzheng Hospital, Huangpu, China
Haotian Cao, Shanghai Changzheng Hospital, Huangpu, China
Xinyun Xu, Shanghai Changzheng Hospital, Huangpu, China
Haiyang Zhou, Shanghai Changzheng Hospital, Huangpu, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

BRIEF RESEARCH REPORT article

Large language model answers medical questions about standard pathology reports

Select one of your emails

Notify me on publication