AUTHOR=Liu Chuan , Wang Ting , Yang Jiahui , Zhang Jixiang , Wei Shuchun , Guo Yingyun , Yu Rong , Tan Zongbiao , Wang Shuo , Dong Weiguo TITLE=Distant Metastasis Pattern and Prognostic Prediction Model of Colorectal Cancer Patients Based on Big Data Mining JOURNAL=Frontiers in Oncology VOLUME=Volume 12 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.878805 DOI=10.3389/fonc.2022.878805 ISSN=2234-943X ABSTRACT=Aims: This study aimed to investigate the distant metastasis pattern from newly diagnosed colorectal cancer (CRC) and also construct and validate a prognostic nomogram both in overall survival (OS) and cancer-specific survival (CSS) of CRC patients with distant multi-organ metastases. Methods: Primary CRC patients, who were initially diagnosed from 2010 to 2016 in the SEER database, were included to analyze. The independent risk factors affecting the OS, CSS, all-cause and CRC–specific mortality of the patients were screened by Cox regression and Fine-Gray competitive risk model. The nomogram models were respectively constructed to predict OS and CSS of the patients. The reliability and accuracy of the prediction model were evaluated by consistency index (C-index) and calibration curve. The gene chip GSE41258 was downloaded from GEO database, and differentially expressed genes (DEGs) were screened by GEO2R online tool (P<0. 05, |logFC|>1. 5). The KEGG Pathway and Gene Ontology (GO) annotation and String website were used for enrichment analysis and protein-protein interaction (PPI) analysis of DEGs respectively, and Cytoscape software was used to construct PPI network and screen function modules and hub genes. Results: 57,835 CRC patients, including 47,823 without distant metastases and 10012(17.31%) with metastases were identified. Older age, unmarried status, poor or undifferentiated grade, right colon site, larger tumor size, N2 stage, more metastatic sites and elevated CEA might lead to poorer prognosis(all P<0.01). The independent risk factors of OS and CSS were included to construct prognosis prediction model for predicting OS and CSS in CRC patients with distant metastasis. C-index and calibration curve of the training group and validation group showed that the models had acceptable predictive performance and high calibration degree. Furthermore, by comparing CRC tissues with and without liver metastasis, 158 DEGs and top 10 hub genes were screened. Hub genes were mainly concentrated in liver function and coagulation function. Conclusion: The big data in the public database were counted and transformed into a prognostic evaluation tool that could be applied to clinic, which has certain clinical significance for the formulation of treatment plan and prognostic evaluation of CRC patients with distant metastasis.