Skip to main content

ORIGINAL RESEARCH article

Front. Pharmacol.
Sec. Pharmacoepidemiology
Volume 15 - 2024 | doi: 10.3389/fphar.2024.1510220
This article is part of the Research Topic Advances in Drug-induced Diseases Volume II View all 32 articles

Using machine learning to identify risk factors for pancreatic cancer: A retrospective cohort study of real-world data

Provisionally accepted
  • 1 West China School of Pharmacy, Sichuan University, Chengdu, Sichuan Province, China
  • 2 Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China
  • 3 National Chengdu Center for Safety Evaluation of Drugs, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Collaborative Innovation Center for Biotherapy, Chengdu, Sichuan Province, China
  • 4 Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, Beijing Municipality, China
  • 5 Department of Pharmacology, University of Michigan, Ann Arbor, Michigan, United States
  • 6 UF Health Shands Hospital,Florida, USA, florida, American Samoa
  • 7 Sichuan University, Chengdu, China

The final, formatted version of the article will be published soon.

    Objectives: This study aimed to identify the risk factors for pancreatic cancer through machine learning.Methods: We investigated the relationships between different risk factors and pancreatic cancer using a real-world retrospective cohort study conducted at West China Hospital of Sichuan University. Multivariable logistic regression, with pancreatic cancer as the outcome, was used to identify covariates associated with pancreatic cancer.The machine learning model extreme gradient boosting (XGBoost) was adopted as the final model for its high performance. Shapley additive explanations (SHAPs) were utilized to visualize the relationships between these potential risk factors and pancreatic cancer.The cohort included 1,982 patients. The median ages for pancreatic cancer and nonpancreatic cancer groups were 58.1 years (IQR: 51.3-64.4) and 57.5 years (IQR: 49.5-64.9), respectively. Multivariable logistic regression indicated that kirsten rats arcomaviral oncogene homolog (KRAS) gene mutation, hyperlipidaemia, pancreatitis, and pancreatic cysts are significantly correlated with an increased risk of pancreatic cancer. The five most highly ranked features in the XGBoost model were KRAS gene mutation status, age, alcohol consumption status, pancreatitis status, and hyperlipidaemia status.Machine learning algorithms confirmed that KRAS gene mutation, hyperlipidaemia, and pancreatitis are potential risk factors for pancreatic cancer.Additionally, the coexistence of KRAS gene mutation and pancreatitis, as well as KRAS gene mutation and pancreatic cysts, is associated with an increased risk of pancreatic cancer. Our findings offered valuable implications for public health strategies targeting the prevention and early detection of pancreatic cancer.

    Keywords: Pancreatic Cancer, machine learning, multivariable logistic regression, Risk factors, KRAS gene mutation

    Received: 12 Oct 2024; Accepted: 11 Nov 2024.

    Copyright: © 2024 Su, TAng, Zhang, Ni, Huang, Liu, Xiao, Zhu and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Yinglan Zhao, Sichuan University, Chengdu, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.