- 1National Human Genetic Resources Center, National Research Institute for Family Planning, Beijing, China
- 2Graduate School of Peking Union Medical College, Beijing, China
- 3Gansu Province Medical Genetics Center, Gansu Provincial Clinical Research Center for Birth Defects and Rare Diseases, Gansu Provincial Maternity and Child-Care Hospital, Lanzhou, China
- 4The Central Laboratory of Birth Defects Prevention and Control, Ningbo Women and Children’s Hospital, Ningbo, China
Phenylketonuria (PKU) is a genetic disorder with amino acid metabolic defect, which does great harms to the development of newborns and children. Early diagnosis and treatment can effectively prevent the disease progression. Here we developed a PKU screening model using random forest classifier (RFC) to improve PKU screening performance with excellent sensitivity, false positive rate (FPR) and positive predictive value (PPV) in all the validation dataset and two testing Chinese populations. RFC represented outstanding advantages comparing several different classification models based on machine learning and the traditional logistic regression model. RFC is promising to be applied to neonatal PKU screening.
Introduction
Phenylketonuria (PKU [MIM: 261600]) is an autosomal recessive genetic disease, which is one of the common disorders of amino acid metabolism (Yan et al., 2019). It is also one of the diseases for newborn screening (NBS) in China. The incidence of PKU in China is 1/10,701, with a higher incidence in the north than in the south (Wang et al., 2015). The incidence of PKU in Hainan province of China is approximately 1/81,967 (Huang et al., 2021) but 1/3,420 in Gansu province (Wang et al., 2015). Due to the high cost of gene detection, some methods for PKU screening were used such as the Guthrie test (Guthrie and Susi, 1963) and high performance liquid chromatography (HPLC) (Moretti et al., 1990) in the early days after birth. Tandem mass spectrometry (MS/MS) is currently used in many countries to screen inborn errors of metabolism (American College of Medical Genetics Newborn Screening Expert Group, 2006; Lindner et al., 2011). In most countries around the world, PKU screening is performed by evaluating phenylalanine (PHE) and tyrosine (TYR) levels in neonatal dry blood spots (DBSs) by LC-MS/MS (Blau et al., 2014). In clinical, newborns with PHE concentration more than 120
Machine learning is the science of artificial intelligence and has been widely used in medicine (Deo, 2015). For example, there are many important applications in the establishment of cancer mutation spectrum, cancer research and nursing care, and the diagnosis and prognosis of cardiovascular and cerebrovascular diseases (Muiños et al., 2021; Meropol et al., 2021; Savarraj et al., 2021). It also plays an important role in the screening of neonatal genetic metabolic diseases (Baumgartner et al., 2004). For example, a random forest machine learning classifier was used to establish NBS models for glutaric acidemia type 1 (GA-1), methylmalonic acidemia (MMA), ornithine transcarboxylase deficiency (OTCD) and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD) (Peng et al., 2020). Further, several studies in PKU screening have attracted more attention. A logistic regression model was constructed for PKU screening, in which sensitivity reached 95%–100% and PPV increased from 19.14% to 32.16% (Zhu et al., 2020). In addition, feature selection strategy was used to obtain the optimal biomarkers and reduce the false positive proportion of PKU (Chen et al., 2013).
However, PKU screening based on the model constructed by machine learning methods has not been widely used in practice. Most hospitals still follow traditional methods for PKU screening. As a result, it is particularly urgent to develop and fine-tune classification models for rare but treatable metabolic diseases such as PKU. It aims at both reducing false positive cases and eliminating false negatives, in order to detect the infants and children with PKU quickly and accurately. In this study, we applied RFC method to improve PKU screening performance with excellent sensitivity, FPR and PPV in two Chinese large populations.
Materials and methods
Metabolic data
The population level newborn screening data of small molecule metabolites were from Gansu Provincial Maternity and Child-care Hospital (GPMCH) in the northwestern China and Ningbo Women and Children’s Hospital (NWCH) in the southeastern China. Small molecule metabolites including 10 amino acids and 31 acylcarnitines of each newborn were obtained from blood by MS/MS. All newborns consist of 43 features, including 41 small molecule metabolites and two ratios which are the traditional biomarkers PHE/TYR and the new potential biomarker MET/PHE [16]. Newborn samples will be divided into two categories, that PKU patients and normal samples without PKU (Non-PKU). All PKU newborns and children have a clear causative pathogenic variant verified by Sanger sequencing or Next-generation sequencing. To protect personal privacy, personal information of all samples was deleted.
Data processing and description
All the samples with other metabolic disorders were excluded for all the datasets to avoid misleading the prediction results. Then, all features were normalized with a multiple of the median (MOM) to avoid systematic errors. The median of every feature is first calculated. Then, the original value is divided by the median to obtain the normalized value, which called MOM value (Yang et al., 2021).
During data preprocessing, 163 PKU patients with treatment information and 565 samples with other metabolic disorders were excluded. The total datasets described in model were all preprocessed. In GPMCH population, 22,867 records from 2015 to 2020 were randomly split into the training and validation datasets at a 7/3 ratio after processing. Consequently, the training dataset contains 132 PKU patients and 15,874 Non-PKU samples for fitting the model, the validation dataset contains 69 PKU patients and 6,792 Non-PKU samples for optimizing the model. Two testing datasets were used to evaluate the performance of the model. One testing dataset (GPMCH_2021) included 9 PKU patients and 1,398 Non-PKU samples from January to May 2021. The other testing dataset (NWCH) included 16 PKU patients and 392,177 Non-PKU samples from 2014 to 2020. The processing steps of these datasets are shown in Figure 1 and descriptive statistics of 43 biomarkers used in the research are depicted in Supplementary Table S1.
Machine learning models
PKU screening models were built using six machine learning methods, including Multilayer Perceptron (MLP), Decision Tree (DT), Stochastic Gradient Descent (SGD), Logistic Regression (LR), K-Nearest Neighbor (KNN) and RFC. All models were built with Scikit-learn-0.23.2 in python and optimized by adjusting parameters.
Logistic regression analysis 3 (LRA3) is a classification model developed by Zhixing zhu et al. with good sensitivity, specificity and PPV for PKU screening [16]. The formula of this model is as follows:
Random forest classifier
RFC is a highly flexible supervised classification tool. The classification model trains and predicts samples with multiple decision trees (Breiman, 2001). It can avoid the phenomenon that a single decision tree is prone to over-fitting and improve prediction accuracy. The process of RFC is summarized as follows:
1) Among the n samples of the original training dataset, i samples are randomly sampled with replacement. All training samples of each classification tree form a new training dataset.
2) For each training dataset, a classification and regression tree algorithm is used to construct the classification tree without pruning leaves is generated separately. At each internal node of the tree, m features
3) There are n classification trees in the RFC model and each tree has a category determination result, the category with the most votes is designated as the final output.
The RFC model was built by fine-tuning its parameters in the training dataset, including the number of trees in the forest, the maximum depth of the tree, the minimum number of samples required to split the internal nodes, the minimum number of samples required for the leaf nodes and measuring the performance of the trained model in the validation dataset. Due to the imbalance of the data, we set category weights with low weights for large sample sizes and high weights for small sample sizes. To obtain the optimal model, “Grid Search” of Python library is used to fine-tune parameters. The ideal requirement in clinical is to detect all PKU patients with excellent PPV at the same time. When the new sample enters the RFC model, each decision tree of RFC gives its own disease status of PKU. By integrating the disease status of each decision tree and adopting a simple voting method of minority obeying the majority, the RFC model determine whether the sample has PKU.
Feature importance
Gini impurity is used to rank the relative importance of each feature. It is the probability of misclassification of randomly selected elements after randomly marking according to the class distribution in the dataset. In RFC, feature importance represents the sum of Gini impurity reduction of all nodes split on features. The smaller the Gini impurity, the smaller the probability that the selected samples in the dataset are misclassified, and the better the feature.
Performance evaluation
This study is a binary classification problem with random forest. The confusion matrix is used to view the correct and wrong recognition of each kind of samples (Table 1).
Pearson chi-square test is a hypothesis testing method based on the chi-square distribution, inferring whether two categorical variables are correlated or independent of each other according to the sample data. In this study, it is applied to test the independence of true value and predict value in the confusion matrix.
Then the performance evaluation indices calculated from the confusion matrix are as follows:
We also plotted precision recall (PR) curve and receiver operating characteristic (ROC) curve to evaluate our model, meanwhile calculated the average precision (AP) and the area under curve (AUC).
Results
Model selection
Two models including RF and LR can get the sensitivity of 100% in training, validation and two testing datasets, while other models including MLP, DT, SGD and KNN cannot. What’s more, all other evaluations including accuracy, specificity, PPV and AUC of RFC are all better in both models (Table 2). Overall, RFC is the optimal model for PKU screening.
TABLE 2. Results of multi-classification models of PKU. And, the bold values represent better results than other models.
Training and evaluation of the model
We constructed a RFC model to classify PKU patients and Non-PKU newborns. The final optimal RFC model used 72 trees in the forest, max depth 18, and min samples leaf 14. AP of the PR curve by RFC reaches 0.911 (Figure 2A), and AUC of the ROC curve reaches 0.999 (Figure 2B) in the validation dataset. These results show that the RFC is a reliable diagnostic tool for PKU screening.
FIGURE 2. Two curves for PKU screening using RFC in the validation dataset: (A) PR curve; (B) ROC curve.
Three of the top-ranked features including PHE/TYR, MET/PHE and PHE play the most important roles for RFC model. All the 43 features importance for the model construction of PKU screening can be seen in Figure 3.
Validation of the model
In the validation dataset, PPV obtained for PKU screening by the traditional medical method (PHE>120
Comparison with the logistic regression model
In both of the testing datasets, we compared RFC with LRA3. RFC detected all patients, while LRA3 missed one PKU patient in the GPMCH_2021 (Table 4) and three in the NWCH dataset (Table 5). At the same time, Specificity and PPV also achieve good performance.
Discussion
Our model can both reduce the number of false positive cases and detect all the PKU patients during PKU screening. Sensitivity is 100% in two testing datasets, which means that none of PKU cases will be missed. In machine learning, there are many common classification models, such as MLP, DT, SGD, LR, KNN and RFC. Various indicators of the classification models are calculated, including accuracy, sensitivity, specificity, PPV and AUC. Comparing with these classification models, RFC showed clear advantages. In two testing datasets, PPV increased significantly compared with the traditional medical method. In the clinical setting, it is necessary to ensure that all PKU patients can be detected which means the sensitivity should be 100%. According to this rule, MLP and KNN methods show good results in the training dataset, but perform poorly in the validation and two testing datasets, where there is severe over-fitting. The DT method also shows excellent performance in the training dataset, but suffers from false negatives in the testing dataset and NWCH (Alexander, 2022). Some false negatives are also existed by LRA3, resulting in some PKU cases being predicted as negative. It is just an acceptable result in machine learning, but not to clinically acceptable.
In addition, Breiman (Breiman, 2001) pointed out that in the extremely imbalanced data, trees in random forest may contain few or none minority classes after bootstrapping, resulting in poor prediction performance for the minority classes. In our model, we set class weights for the extremely imbalanced data due to the large difference in the amount of data between positive and negative samples. In the tree induction procedure, class weights are used to weight the Gini impurity for finding the split (Chen et al., 2004), which is very important to the accuracy of the model.
Our study also has some shortcomings. Firstly, the number of positive samples in the testing dataset is not large enough for the very low incidence in southern China. For further development, it is necessary to increase negative and positive samples in the testing dataset to validate the model. Secondly, we found that the PPV of the NWCH dataset was lower than that of the GPMCH_2021 dataset, which may be related to the difference in the incidence rate between the north and the south. Since the incidence rate in the south is lower than that in the north and the penalty weight is calculated according to the proportion of positive and negative samples, the penalty weight of negative samples in the NWCH dataset is much greater than that of negative samples in the GPMCH_2021 dataset. We used the data of Gansu Province to train the model, there were more false positives and lower PPV when the NWCH dataset was the testing dataset. Finally, in low birth weight and premature newborns, the meaning of the measured value is often unclear, and there is no definite reference value so far, which is bound to have an impact on the prediction results.
In conclusion, machine learning-based random forest classifier can improve PKU screening performance with excellent sensitivity, FPR and PPV in two Chinese large populations. RFC is promising to be applied to neonatal PKU screening.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding authors.
Ethics statement
Written informed consent was obtained from the minor(s)' legal guardian/next of kin for the publication of any potentially identifiable images or data included in this article.
Author contributions
The work presented here was carried out in collaboration among all the authors. ZY, YS, CZ, and ZC designed this study. CZ, SH, HL, SW, XY, QL, and DZ provided the data. ZY, YS, CZ, and XZ processed the data. ZY and YS conducted the statistical modeling and performed the data analysis. ZY, YS, CZ, ZC, and XM wrote and reviewed the manuscript. All authors read and approved the final manuscript.
Funding
This work was supported by the National Key Research and Development Program of China (2016YFC1000307); National Population and Reproductive Health Science Data Center (2005DKA32408).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb.2022.986556/full#supplementary-material
References
American College of Medical Genetics Newborn Screening Expert Group (2006). Newborn screening: Toward a uniform screening panel and system-executive summary. Pediatrics 117, S296–S307. doi:10.1542/peds.2005-2633I
Baumgartner, C., Böhm, C., Baumgartner, D., Marini, G., Weinberger, K., Olgemöller, B., et al. (2004). Supervised machine learning techniques for the classification of metabolic disorders in newborns. Bioinformatics 20 (17), 2985–2996. doi:10.1093/bioinformatics/bth343
Blau, N., Shen, N., and Carducci, C. (2014). Molecular genetics and diagnosis of phenylketonuria: State of the art. Expert Rev. Mol. diagn. 14, 655–671. doi:10.1586/14737159.2014.923760
Chen, W. H., Hsieh, S. L., Hsu, K. P., Chen, H. P., Su, X. Y., Tseng, Y. J., et al. (2013). Web-based newborn screening system for metabolic diseases: Machine learning versus clinicians. J. Med. Internet Res. 15 (5), e98. doi:10.2196/jmir.2495
Deo, R. C. (2015). Machine learning in medicine. Circulation 132, 1920–1930. doi:10.1161/circulationaha.115.001593
Guthrie, R., and Susi, A. (1963). A simple phenylalanine method for detecting phenylketonuria in large populations of newborn infants. Pediatrics 32, 338–343. doi:10.1542/peds.32.3.338
Huang, C. D., Zhao, Z. D., Liu, X. L., Wen, Y. M., Zhu, X. M., and Yang, C., (2021). Screening results and genetic analysis of neonatal tetrahydrobiopterin deficiency in Hainan Province from 2007 to 2019. Zhonghua Yi Xue Za Zhi 101, 3161–3163. doi:10.3760/cma.j.cn112137-20210121-00200
Lindner, M., Gramer, G., Haege, G., Fang-Hoffmann, J., Schwab, K. O., Tacke, U., et al. (2011). Efficacy and outcome of expanded newborn screening for metabolic diseases - report of 10 years from South-West Germany *. Orphanet J. Rare Dis. 6, 44. doi:10.1186/1750-1172-6-44
Meropol, N. J., Donegan, J., and Rich, A. S. (2021). Progress in the application of machine learning algorithms to cancer research and care. JAMA Netw. Open 4 (7), e2116063. doi:10.1001/jamanetworkopen.2021.16063
Moretti, F., Birarelli, M., Carducci, C., Pontecoryvi, A., Antonozzi, I., and Pontecorvi, A. (1990). Simultaneous high-performance liquid chromatographic determination of amino acids in a dried blood spot as a neonatal screening test. J. Chromatogr. 511, 131–136. doi:10.1016/s0021-9673(01)93278-9
Muiños, F., Martinez-Jimenez, F., Pich, O., Gonzalez-Perez, A., and Lopez-Bigas, N. (2021). In silico saturation mutagenesis of cancer genes. Nat. N. 596, 428–432. doi:10.1038/s41586-021-03771-1
Peng, G., Tang, Y., Cowan, T. M., Enns, G. M., and Scharfe, C. (2020). Reducing false-positive results in newborn screening using machine learning. Int. J. Neonatal Screen. 6, 16. doi:10.3390/ijns6010016
Savarraj, J. P., Hergenroeder, G. W., Zhu, L., Chang, T., Park, H. A., and Megjhani, M. (2021). Machine learning to predict delayed cerebral ischemia and outcomes in subarachnoid hemorrhage. Neurology 96 (4), e553–e562. doi:10.1212/wnl.0000000000011211
Wang, C. M., Wang, H. Q., and Zhang, H. (2015). Analysis on the results of neonatal screening in the south region of Xinjiang in 2009-2013. Prac. Prev. Med. 22, 72–74. doi:10.21203/rs.3.rs-1324180/v1
Wang, X., Hao, S. J., Chen, P. L., Feng, X., and Yan, Y. S. (2019). Analysis on screening results of phenylketonuria among 567 691 neonates in Gansu Province. Int. J. Lab. Med. 24, 3588–3590. doi:10.3969/j.issn.1673-4130.2015.24.034
Yan, Y., Zhang, C., Jin, X., Zhang, Q., Zheng, L., Feng, X., et al. (2019). Mutation spectrum of PAH gene in phenylketonuria patients in northwest China: Identification of twenty novel variants. Metab. Brain Dis. 34, 733–745. doi:10.1007/s11011-019-0387-7
Yang, R. L., Yang, Y. L., Wang, T., Xu, W. Z., Shu, Q., and Yang, J. B., (2021). Establishment of an auxiliary diagnosis system of newborn screening for inherited metabolic diseases based on artificial intelligence technology and a clinical trial. Chin. J. Ped. 59, 286–293. doi:10.3760/cma.j.cn112140-20201209-01089
Keywords: newborn screening, MRM, machine learning, phenylketonuria, random forest classifier
Citation: Song Y, Yin Z, Zhang C, Hao S, Li H, Wang S, Yang X, Li Q, Zhuang D, Zhang X, Cao Z and Ma X (2022) Random forest classifier improving phenylketonuria screening performance in two Chinese populations. Front. Mol. Biosci. 9:986556. doi: 10.3389/fmolb.2022.986556
Received: 05 July 2022; Accepted: 26 September 2022;
Published: 11 October 2022.
Edited by:
Luciana Hannibal, University of Freiburg Medical Center, GermanyReviewed by:
Alex Jung, Aalto University, FinlandCristian F. Pasluosta, University of Freiburg, Germany
Copyright © 2022 Song, Yin, Zhang, Hao, Li, Wang, Yang, Li, Zhuang, Zhang, Cao and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Zongfu Cao, zongfu_cao@163.com; Xu Ma, maxubioinfo@163.com
†These authors have contributed equally to this work