AUTHOR=Fan Haoliang , Xie Qiqian , Zhang Zheng , Wang Junhao , Chen Xuncai , Qiu Pingming TITLE=Chronological Age Prediction: Developmental Evaluation of DNA Methylation-Based Machine Learning Models JOURNAL=Frontiers in Bioengineering and Biotechnology VOLUME=Volume 9 - 2021 YEAR=2022 URL=https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2021.819991 DOI=10.3389/fbioe.2021.819991 ISSN=2296-4185 ABSTRACT=Epigenetic clock, a highly accurate age estimator based on DNA methylation (DNAm) level, is the basis for predicting mortality/morbidity and elucidating the molecular mechanism of aging, which is of great significant in forensics, justice, and social life. Herein, we integrated machine learning (ML) algorithms to construct blood epigenetic clock in Southern Han Chinese (CHS) for chronological age prediction. The correlation coefficient (r) meta-analyses of 7084 individuals were firstly implemented to select five genes (ELOVL2, C1orf132, TRIM59, FHL2, and KLF14) from a candidate set of nine age-associated DNAm biomarkers. The DNAm-based profiles of CHS cohort (240 blood samples differing in age from 1 to 81 years) were generated by the bisulfite targeted amplicon pyrosequencing (BTA-pseq) from 34 cytosine-phosphate-guanine sites (CpGs) of five selected genes, revealing that the methylation levels at different CpGs exhibit population specificity. Furthermore, we established and evaluated four chronological age prediction models using distinct ML algorithms: stepwise regression (SR), support vector regression (SVR-eps and SVR-nu), and random forest regression (RFR). The median absolute deviation (MAD) values increased with chronological age, especially in the 61-81 age category. No apparent gender effect was found in different ML models of CHS cohort (all P > 0.05). The MAD values were 2.97, 2.22, 2.19, and 1.29 (years) for SR, SVR-eps, SVR-nu, and RFR in CHS cohort, respectively. Eventually, compared to the MAD range of meta cohort (2.53-5.07 years), a promising RFR model (ntree = 500 and mtry = 8) was optimized with a MAD of 1.15 (years) in the 1-60 age categories of CHS cohort, which could be regarded as a robust epigenetic clock in blood for age-related issues.