ORIGINAL RESEARCH article

Front. Med.

Sec. Gastroenterology

Volume 12 - 2025 | doi: 10.3389/fmed.2025.1583514

This article is part of the Research TopicThe Emerging Role of Large Language Model Chatbots in Gastroenterology and Digestive EndoscopyView all 3 articles

Enhancing Gastroenterology with Multimodal Learning: The Role of Large Language Model Chatbots in Digestive Endoscopy

Provisionally accepted
  • 1Nanjing University, Nanjing, China
  • 2Southeast University, Nanjing, Jiangsu Province, China

The final, formatted version of the article will be published soon.

Advancements in artificial intelligence (AI) and large language models (LLMs) have the potential to revolutionize digestive endoscopy by enhancing diagnostic accuracy, improving procedural efficiency, and supporting clinical decision-making. Traditional AI-assisted endoscopic systems often rely on single-modal image analysis, which lacks contextual understanding and adaptability to complex gastrointestinal (GI) conditions. Moreover, existing methods struggle with domain shifts, data heterogeneity, and interpretability, limiting their clinical applicability. To address these challenges, we propose a multimodal learning framework that integrates LLM-powered chatbots with endoscopic imaging and patient-specific medical data. Our approach employs self-supervised learning to extract clinically relevant patterns from heterogeneous sources, enabling real-time guidance and AI-assisted report generation. We introduce a domain-adaptive learning strategy to enhance model generalization across diverse patient populations and imaging conditions.Experimental results on multiple GI datasets demonstrate that our method significantly improves lesion detection, reduces diagnostic variability, and enhances physician-AI collaboration. This study highlights the potential of multimodal LLM-based systems in advancing gastroenterology by providing interpretable, context-aware, and adaptable AI support in digestive endoscopy.

Keywords: multimodal learning, Large language models, Digestive endoscopy, AI-Assisted Diagnosis, Domain adaptation

Received: 26 Feb 2025; Accepted: 24 Apr 2025.

Copyright: © 2025 Wu, Qin, Chang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mianhua Wu, Nanjing University, Nanjing, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.