AUTHOR=Su Na , Tang Rui , Zhang Yice , Ni Jiaqi , Huang Yimei , Liu Chunqi , Xiao Yuzhou , Zhu Baoting , Zhao Yinglan TITLE=Using machine learning to identify risk factors for pancreatic cancer: a retrospective cohort study of real-world data JOURNAL=Frontiers in Pharmacology VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2024.1510220 DOI=10.3389/fphar.2024.1510220 ISSN=1663-9812 ABSTRACT=Objectives

This study aimed to identify the risk factors for pancreatic cancer through machine learning.

Methods

We investigated the relationships between different risk factors and pancreatic cancer using a real-world retrospective cohort study conducted at West China Hospital of Sichuan University. Multivariable logistic regression, with pancreatic cancer as the outcome, was used to identify covariates associated with pancreatic cancer. The machine learning model extreme gradient boosting (XGBoost) was adopted as the final model for its high performance. Shapley additive explanations (SHAPs) were utilized to visualize the relationships between these potential risk factors and pancreatic cancer.

Results

The cohort included 1,982 patients. The median ages for pancreatic cancer and nonpancreatic cancer groups were 58.1 years (IQR: 51.3–64.4) and 57.5 years (IQR: 49.5–64.9), respectively. Multivariable logistic regression indicated that kirsten rats arcomaviral oncogene homolog (KRAS) gene mutation, hyperlipidaemia, pancreatitis, and pancreatic cysts are significantly correlated with an increased risk of pancreatic cancer. The five most highly ranked features in the XGBoost model were KRAS gene mutation status, age, alcohol consumption status, pancreatitis status, and hyperlipidaemia status.

Conclusion

Machine learning algorithms confirmed that KRAS gene mutation, hyperlipidaemia, and pancreatitis are potential risk factors for pancreatic cancer. Additionally, the coexistence of KRAS gene mutation and pancreatitis, as well as KRAS gene mutation and pancreatic cysts, is associated with an increased risk of pancreatic cancer. Our findings offered valuable implications for public health strategies targeting the prevention and early detection of pancreatic cancer.