This study aimed to identify colorectal cancer (CRC)-associated phylogenetic and functional bacterial features by a large-scale metagenomic sequencing and develop a binomial classifier to accurately distinguish between CRC patients and healthy individuals.
We conducted shotgun metagenomic analyses of fecal samples from a ZhongShanMed discovery cohort of 121 CRC and 52 controls and SouthernMed validation cohort of 67 CRC and 44 controls. Taxonomic profiling and quantification were performed by direct sequence alignment against genome taxonomy database (GTDB). High-quality reads were also aligned to IGC datasets to obtain functional profiles defined by Kyoto Encyclopedia of Genes and Genomes (KEGG). A least absolute shrinkage and selection operator (LASSO) classifier was constructed to quantify risk scores of probability of disease and to discriminate CRC from normal for discovery, validation, Fudan, GloriousMed, and HongKong cohorts.
A diverse spectrum of bacterial and fungi species were found to be either enriched (368) or reduced (113) in CRC patients (q<0.05). Similarly, metabolic functions associated with biosynthesis and metabolism of amino acids and fatty acids were significantly altered (q<0.05). The LASSO regression analysis of significant changes in the abundance of microbial species in CRC achieved areas under the receiver operating characteristic curve (AUROCs) of 0.94 and 0.91 in the ZhongShanMed and SouthernMed cohorts, respectively. A further analysis of Fudan, GloriousMed, and HK cohorts using the same classification model also demonstrated AUROC of 0.80, 0.78, and 0.91, respectively. Moreover, major CRC-associated bacterial biomarkers identified in this study were found to be coherently enriched or depleted across 10 metagenomic sequencing studies of gut microbiota.
A coherent signature of CRC-associated bacterial biomarkers modeled on LASSO binomial classifier maybe used accurately for early detection of CRC.