Skip to main content

ORIGINAL RESEARCH article

Front. Oncol.
Sec. Cancer Molecular Targets and Therapeutics
Volume 14 - 2024 | doi: 10.3389/fonc.2024.1505675
This article is part of the Research Topic Novel Molecular Targets in Cancer Therapy View all 9 articles

Machine Learning-Based Identification of Proteomic Markers in Colorectal Cancer Using UK Biobank Data

Provisionally accepted
Swarnima Kollampallath Radhakrishnan Swarnima Kollampallath Radhakrishnan Dipanwita Nath Dipanwita Nath Dominic Russ Dominic Russ Laura Bravo Merodio Laura Bravo Merodio Priyani Lad Priyani Lad Folakemi Kola Daisi Folakemi Kola Daisi Animesh Acharjee Animesh Acharjee *
  • University of Birmingham, Birmingham, United Kingdom

The final, formatted version of the article will be published soon.

    Colorectal cancer is one of the leading causes of cancer-related mortality in the world. Incidence and mortality are predicted to rise globally during the next several decades. When detected early, colorectal cancer is treatable with surgery and medications. This leads to the requirement for prognostic and diagnostic biomarker development. Our study integrates machine learning models and protein network analysis to identify protein biomarkers for colorectal cancer. Our methodology leverages an extensive collection of proteome profiles from both healthy and colorectal cancer individuals. To identify a potential biomarker with high predictive ability, we used three machine learning models. To enhance the interpretability of our models, we quantify each protein's contribution to the model's predictions using SHapley Additive exPlanations values. Three classifiers-LASSO, XGBoost, and LightGBM were evaluated for predictive performance along with hyperparameter tuning of each model using grid search, with LASSO achieving the highest AUC of 75% in the UK Biobank dataset and the AUCs for LightGBM and XGBoost are 69.61% and 71.42%, respectively. Using SHapley Additive exPlanations values, TFF3, LCN2, and CEACAM5 were found to be key biomarkers associated with cell adhesion and inflammation. Protein quantitative trait loci analyze studies provided further evidence for the involvement of TFF1, CEACAM5, and SELE in colorectal cancer, with possible connections to the PI3K/Akt and MAPK signaling pathways. By offering insights into colorectal cancer diagnostics and targeted therapeutics, our findings set the stage for further biomarker validation.

    Keywords: colorectal cancer, Proteins, UK Biobank, biomarkers, Shap, Translational research, machine learning, decision tree

    Received: 03 Oct 2024; Accepted: 02 Dec 2024.

    Copyright: © 2024 Radhakrishnan, Nath, Russ, Merodio, Lad, Daisi and Acharjee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Animesh Acharjee, University of Birmingham, Birmingham, United Kingdom

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.