AUTHOR=Yin Yu , Chen Congcong , Zhang Dong , Han Qianguang , Wang Zijie , Huang Zhengkai , Chen Hao , Sun Li , Fei Shuang , Tao Jun , Han Zhijian , Tan Ruoyun , Gu Min , Ju Xiaobing TITLE=Construction of predictive model of interstitial fibrosis and tubular atrophy after kidney transplantation with machine learning algorithms JOURNAL=Frontiers in Genetics VOLUME=14 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1276963 DOI=10.3389/fgene.2023.1276963 ISSN=1664-8021 ABSTRACT=

Background: Interstitial fibrosis and tubular atrophy (IFTA) are the histopathological manifestations of chronic kidney disease (CKD) and one of the causes of long-term renal loss in transplanted kidneys. Necroptosis as a type of programmed death plays an important role in the development of IFTA, and in the late functional decline and even loss of grafts. In this study, 13 machine learning algorithms were used to construct IFTA diagnostic models based on necroptosis-related genes.

Methods: We screened all 162 “kidney transplant”–related cohorts in the GEO database and obtained five data sets (training sets: GSE98320 and GSE76882, validation sets: GSE22459 and GSE53605, and survival set: GSE21374). The training set was constructed after removing batch effects of GSE98320 and GSE76882 by using the SVA package. The differentially expressed gene (DEG) analysis was used to identify necroptosis-related DEGs. A total of 13 machine learning algorithms—LASSO, Ridge, Enet, Stepglm, SVM, glmboost, LDA, plsRglm, random forest, GBM, XGBoost, Naive Bayes, and ANNs—were used to construct 114 IFTA diagnostic models, and the optimal models were screened by the AUC values. Post-transplantation patients were then grouped using consensus clustering, and the different subgroups were further explored using PCA, Kaplan–Meier (KM) survival analysis, functional enrichment analysis, CIBERSOFT, and single-sample Gene Set Enrichment Analysis.

Results: A total of 55 necroptosis-related DEGs were identified by taking the intersection of the DEGs and necroptosis-related gene sets. Stepglm[both]+RF is the optimal model with an average AUC of 0.822. A total of four molecular subgroups of renal transplantation patients were obtained by clustering, and significant upregulation of fibrosis-related pathways and upregulation of immune response–related pathways were found in the C4 group, which had poor prognosis.

Conclusion: Based on the combination of the 13 machine learning algorithms, we developed 114 IFTA classification models. Furthermore, we tested the top model using two independent data sets from GEO.