The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Genet.
Sec. Cancer Genetics and Oncogenomics
Volume 15 - 2024 |
doi: 10.3389/fgene.2024.1410353
Gradient boosting reveals spatially diverse cholesterol gene signatures in colon cancer
Provisionally accepted- 1 Department of Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health, Indiana University, Purdue University Indianapolis, Indianapolis, Indiana, United States
- 2 Department of Medical and Molecular Genetics, School of Medicine, Indiana University Bloomington, Indianapolis, Indiana, United States
- 3 Department of Statistics, College of Science, Purdue University, West Lafayette, Indiana, United States
- 4 Stony Brook University, Stony Brook, New York, United States
- 5 Melvin and Bren Simon Comprehensive Cancer Center, Department of Medicine, School of Medicine, Indiana University Bloomington, Indianapolis, Indiana, United States
- 6 Indiana Biosciences Research Institute, Indianapolis, Indiana, United States
Colon cancer (CC) is the second most common cause of cancer deaths and the fourth most prevalent cancer in the United States. Recently cholesterol metabolism has been identified as a potential therapeutic avenue due to its consistent association with tumor treatment effects and overall prognosis. We conducted differential gene analysis and KEGG pathway analysis on paired tumor and adjacentnormal samples from the TCGA Colon Adenocarcinoma project, identifying that bile secretion was the only significantly downregulated pathway. To evaluate the relationship between cholesterol metabolism and CC prognosis, we used the genes from this pathway in several statistical models like Cox proportional Hazard (CPH), Random Forest (RF), Lasso Regression (LR), and the eXtreme Gradient Boosting (XGBoost) to identify the genes which contributed highly to the predictive ability of all models, ADCY5, and SLC2A1. We demonstrate that using cholesterol metabolism genes with XGBoost models improves stratification of CC patients into low and high-risk groups compared with traditional CPH, RF and LR models. Spatial transcriptomics (ST) revealed that SLC2A1 (glucose transporter 1, GLUT1) colocalized with small blood vessels. ADCY5 localized to stromal regions in both the ST and protein immunohistochemistry. Interestingly, both these significant genes are expressed in tissues other than the tumor itself, highlighting the complex interplay between the tumor and microenvironment, and that druggable targets may be found in the ability to modify how "normal" tissue interacts with tumors.
Keywords: Colon cancer (CC), Cholesterol, Bile acids, Prognostic genes, Machine Learning (ML), spatial transcriptomics (ST)
Received: 01 Apr 2024; Accepted: 08 Nov 2024.
Copyright: © 2024 Yang, Chatterjee, Couetil, Liu, Ardon, Chen, Zhang, Huang and Johnson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Travis Steele Johnson, Department of Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health, Indiana University, Purdue University Indianapolis, Indianapolis, IN 46202-2872, Indiana, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.