Skip to main content

ORIGINAL RESEARCH article

Front. Public Health
Sec. Digital Public Health
Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1501408
This article is part of the Research Topic Advancing Public Health through Generative Artificial Intelligence: A Focus on Digital Well-Being and the Economy of Attention View all articles

Integrating Retrieval-Augmented Generation for Enhanced Personalized Physician Recommendations in Web-Based Medical Services: Model Development Study

Provisionally accepted
Yingbin Zheng Yingbin Zheng 1*Yiwei Yan Yiwei Yan 1Sai Chen Sai Chen 2Yunping Cai Yunping Cai 2Kun Ren Kun Ren 2Yishan Liu Yishan Liu 3Jiaying Zhuang Jiaying Zhuang 1Min ZHAO Min ZHAO 1*
  • 1 Biomedical Big Data Center, First Affiliated Hospital of Xiamen University, Xiamen, Fujian Province, China
  • 2 Meteorological Disaster Prevention Technology Center, Xiamen Meteorological Bureau, Xiamen, China
  • 3 Software Engineering, School of Software, Taiyuan University of Technology, Taiyuan, Shanxi Province, China

The final, formatted version of the article will be published soon.

    Background: Web-based medical services have significantly improved access to healthcare by enabling remote consultations, streamlining scheduling, and improving access to medical information. However, providing personalized physician recommendations remains a challenge, often relying on manual triage by schedulers, which can be limited by scalability and availability. Objective: This study aimed to develop and validate a Retrieval-Augmented Generation-Based Physician Recommendation (RAGPR) model for better triage performance. Methods: This study utilizes a comprehensive dataset consisting of 646,383 consultation records from the Internet Hospital of the First Affiliated Hospital of Xiamen University. The research primarily evaluates the performance of various embedding models, including FastText, SBERT, and OpenAI, for the purposes of clustering and classifying medical condition labels. Additionally, the study assesses the effectiveness of large language models (LLMs) by comparing Mistral, GPT-4o-mini, and GPT-4o. Furthermore, the study includes the participation of three triage staff members who contributed to the evaluation of the efficiency of the RAGPR model through questionnaires. Results: The results of the study highlight the different performance levels of different models in text embedding tasks. FastText has an F1-score of 46%, while the SBERT and OpenAI significantly outperform it, achieving F1-scores of 95% and 96%, respectively. The analysis highlights the effectiveness of LLMs, with GPT-4o achieving the highest F1-score of 95%, followed by Mistral and GPT-4o-mini with F1-scores of 94% and 92%, respectively. In addition, the performance ratings for the models are as follows: Mistral with 4.56, GPT-4o-mini with 4.45 and GPT-4o with 4.67. Among these, SBERT and Mistral are identified as the optimal choices due to their balanced performance, cost effectiveness, and ease of implementation. Conclusions: The RAGPR model can significantly improve the accuracy and personalization of web-based medical services, providing a scalable solution for improving patient-physician matching.

    Keywords: Large Language Models1, Mistral, SBERT2, triage systems3, retrieval-augmented generation-based physician recommendation4, RAGPR model5

    Received: 25 Sep 2024; Accepted: 08 Jan 2025.

    Copyright: © 2025 Zheng, Yan, Chen, Cai, Ren, Liu, Zhuang and ZHAO. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Yingbin Zheng, Biomedical Big Data Center, First Affiliated Hospital of Xiamen University, Xiamen, 361001, Fujian Province, China
    Min ZHAO, Biomedical Big Data Center, First Affiliated Hospital of Xiamen University, Xiamen, 361001, Fujian Province, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.