- 1Department of Mathematical Sciences, Florida Institute of Technology, Melbourne, FL, United States
- 2Department of Biomedical and Chemical Engineering and Sciences, Melbourne, FL, United States
- 3Department of Biology, University of Florida, Gainesville, FL, United States
- 4Department of Mathematics, SUNY Potsdam, Potsdam, NY, United States
- 5Department of Computer Engineering and Sciences, Florida Institute of Technology, Melbourne, FL, United States
- 6Department of Medicine, Georgetown University Medical Center, Washington, DC, United States
We apply a pattern-based classification method to identify clinical and genomic features associated with the progression of Chronic Kidney disease (CKD). We analyze the African-American Study of Chronic Kidney disease with Hypertension dataset and construct a decision-tree classification model, consisting 15 combinatorial patterns of clinical features and single nucleotide polymorphisms (SNPs), seven of which are associated with slow progression and eight with rapid progression of renal disease among African-American Study of Chronic Kidney patients. We identify four clinical features and two SNPs that can accurately predict CKD progression. Clinical and genomic features identified in our experiments may be used in a future study to develop new therapeutic interventions for CKD patients.
1 Introduction
The main function of kidney is to remove excess water and waste products from blood. It also helps to regulate the levels of minerals such as sodium, calcium, and potassium in blood. One suffers from chronic kidney disease (CKD), also known as renal disease, when kidney losses its function gradually and usually permanently. CKD, defined by reduced glomerular filtration rate (GFR), proteinuria, or structural kidney disease, is a worldwide growing public health problem1. Many subjects with renal disease of most etiologies progress to severe renal failure and/or end stage renal disease (ESRD), requiring renal replacement therapy, which may involve a form of dialysis or renal transplantation (Lewis et al., 1993; Klahr et al., 1994; DCCT, 1995; Brenner et al., 2001; Lewis et al., 2001; Wright et al., 2002; Niki et al., 2015). However, progression rate of CKD is very heterogeneous (Lindeman et al., 1985; Lindeman, 1990; Hallan et al., 2006). While a few predictive factors for progression such as proteinuria have been detected, identification of those at risk to progress remains a significant problem. It has also been established that there are several therapies that can ameliorate the progression of renal disease including ACE inhibitors, blood pressure control, tight diabetes control and perhaps low protein diets; however, in trials examining these therapeutic modalities there remains a very significant risk of progression of renal disease in the subjects receiving optimal therapy (Lewis et al., 1993; Klahr et al., 1994; DCCT, 1995; Brenner et al., 2001; Lewis et al., 2001; Wright et al., 2002; Niki et al., 2015).
African-American Study of Chronic Kidney disease with Hypertension (AASK) was motivated by the high rate of hypertension-related chronic kidney disease in the African-American population and the scarcity of effective therapies. The study involved 21-center randomized double-blinded treatment trial of 1,094 African-American patients with hypertension at ages ranging from 18 to 70 years. Patients had renal failure with GFR between 20 and 65 ml/min/1.73m2. Patients were randomized to the angiotensinogen converting enzyme inhibitor (ACEi) ramipril, the β-blocker (BB) metoprolol or the dihydropyridine calcium channel blocker (CCB) amlodipine, and to usual (mean arterial pressure (MAP 102–107) or low (MAP
The initial AASK results were not conclusive (Wright et al., 2002). While the adopted therapy was shown to slow the progression of renal disease, there was still high rate of progression to renal failure. The CCB arm of the study was stopped early when interim analysis indicated that CCB was inferior to both BB and ACEi in patients with
Several possible interventions such as blood pressure control (Wright et al., 2002), diabetes treatment (DCCT, 1995), controlling dietary protein intake (Klahr et al., 1994) and medications with possible renoprotective effects (Ruggenenti et al., 1999; Agodoa et al., 2001; Wright et al., 2002) have been tested in clinical trials. In all cases, the residual rate of progression of chronic kidney disease has remained significant. To date, there are few prediction models to identify which patients are likely to progress significantly. Subasi et al. (2017) (Subasi et al., 2017) identified serum proteomic patterns that can accurately distinguish rapid progression and slow progression among AASK patients. Recently, Lipkowitz et al. (2013) (Parsa et al., 2013) examined effects of variants in gene encoding apolipoprotein L1 (APOL1) on the disease progression and observed that renal risk variants in APOL1 were associated with the higher rates of ESRD and progression of chronic kidney disease in African-American patients as compared to white patients. Other recent studies include Rahman et al. (2013), where the effects of two antihypertensive drug dose (PM dose and add-on dose) schedules on nocturnal blood pressure vs. usual therapy (AM dose) in former participants were determined and Chen et al. (2016), where the longitudinal changes in hematocrit in hypertensive renal disease were studied.
The goal of our current study is to apply a pattern-based classification method to identify clinical and genomic features that may serve as prognostic markers for the progression of renal disease among AASK patients. Clinical and genomic features identified in our analysis shall be used in a future study to obtain comparison of the disease progression in white patients and African-American patients, both of those with and those without apolipoprotein L1 (APOL1) high-risk variants. The ultimate goal of our AASK data analysis, started in (Subasi et al., 2017) and continued in this current work, is to identify new targets and provide basis for new therapeutic interventions for chronic kidney disease.
2 Study Subjects
Closer inspection of the data highlights the current dilemma: although there is a
Figure 3 indicates the significant heterogeneity of progression rate of renal disease in the AASK Trial, where the rate of decline of GFR after 6 months in the trial (chronic GFR slope) is depicted in blue for each patient from most rapid decline (negative slope) on the left, to the least rapid decline (positive slope) on the right. The expected rate of decline of GFR with aging is generally assumed to be
2.1 Pre-processing of AASK Data to Predict Progression of Renal Disease
An avenue that has not been carefully explored is a data mining approach to detect the combinations of clinical features and/or single nucleotide polymorphisms (SNPs) that better determine the population at risk for progression of CKD. The goal of this section is to identify combinatorial patterns of clinical features and SNPs that can accurately predict progression of the renal disease among AASK patients. In order to achieve this, we perform a study on a selected subset of subjects from the AASK Clinical Trial based on the glomerular filtration slope (GFR) of all AASK patients presented in Figure 3. The original AASK data contains 1,094 African-American patients with 88 clinical features and 130 SNPs. Before we start our analysis, we remove features with more than
Figure 5 shows the PCA plot of the AASK patients in the reduced dataset. Table 2 describes the patient population for this study. As can be seen from the table, proteinuria is very different between the two groups of disease progression, which supports the previous studies showing that proteinuria is the strongest predictor of GFR slope progression in AASK (Wang et al., 2006).
2.2 Identification of Significant Clinical and Genomic Features
The resulting AASK dataset consisting of 138 rapid progressors, 75 slow progressors, 77 clinical features, and 113 SNPs, is further investigated to remove any features irrelevant for the recognition of a rapid progressor as opposed to a slow progressor. In order to obtain a classification model effectively and efficiently, we first apply a correlation-based feature selection procedure (Hall and Smith, 1998) to retain only those relevant features successfully distinguishing between rapid progressors and slow progressors in AASK data. Correlation-based feature selection method evaluates the worth of a subset of features by considering the individual predictive ability of each feature along with the degree of redundancy between them. Subsets of features that are highly correlated with the outcome (rapid/slow progression) while having low intercorrelation are preferred. AASK data is randomly partitioned into ten approximately equal parts; one of these subsets is designated as “test set”, correlation based feature selection is built on the remaining nine subsets which form the “training dataset”, and then evaluated on the cases in the test set. This procedure is repeated ten times, always taking another one of the ten parts in the role of the test set (re-randomizing the patients into ten new subsets and repeating the procedure nine additional times for a total of 100 tests).
Table 3 shows the features selected from ten times 10-folding cross-validation of the correlation-based feature subset selection procedure in WEKA, a commonly used open source data mining software (Hall et al., 2009). The rationale for using small numbers of features is both for ease in collecting the relevant data for prediction on patients from different sources (health systems) and the possibility that finding a small number of novel predictors may help inform studies into the mechanisms and treatment of CKD progression if they suggest new and unexplored pathways. The SNPs and the fact that the alpha-2 agonist antihypertensive medicine use are predictors may help in this manner.
3 PATTERN-BASED Classification Model to Predict Progression of Renal Disease
3.1 Identification of Combinatorial Patterns of Significant Clinical Features and SNPs
Study Subjects analysis provides us with a reduced AASK data, containing 138 rapid progressors and 75 slow progressor with.
• four clinical features: α-agonist (peripherol base), proteinuria, urine-protein/urine-creatinine, GFR value at G1 visit, where α-agonist represents the use of peripheral alpha-2 agonist blood pressure medication
• two SNPs: CHGB-1, PLCG2 rs4399527.
These six features were validated using 10 × 10-folding cross-validation experiments on seven commonly used and well-known classification methods, including Random Forest, Decision Trees, Nearest Neighbor, Support Vector Machines, Neural Networks, Logistic Regression, and Naïve Bayes (Hall et al., 2009). In this step the AASK data is randomly partitioned into ten approximately equal parts; one of these subsets is designated as “test set”, a model is built on the remaining nine subsets which form the “training dataset”, and then tested by predicting the classes of patients in the test set using a classification method. This procedure is repeated 10 times, always taking another one of the ten parts in the role of the test set (re-randomizing the patients into 10 new subsets and repeat the procedure nine additional times) for a total of 100 tests for each of the seven classification methods. Table 4 shows average accuracy, sensitivity (proportion of correctly classified rapid progressors), specificity (proportion of correctly classified slow progressors) as well as average precision, recall, F-measure, and area under Receiver Operating Characteristic (ROC) curve.
As can be seen in Table 4, while Random Forest provides us with highest accuracy, C4.5 Decision Tree (Quinlan, 1993), a non-parametric supervised learning method used for classification and regression, provides the best sensitivity and specificity, i.e., the best prediction for rapid and slow prediction. C4.5 classification model consisting of seven patterns, S1-S7, for slow progressors and eight patterns, R1-R8, for rapid progressors is presented in Table 5 as combinatorial patterns of clinical features and SNPs associated with slow and rapid progression in the AASK dataset. Figures 6 and 8 show the C4.5 decision tree and heatmap corresponding to the combinatorial patterns presented in Table 5, respectively.
The pattern characteristics including
• rapid prevalence: proportion of rapid progressors covered by a pattern to the total number of rapid progressors,
• slow prevalence: proportion of slow progressors covered by a pattern to the total number of slow progressors,
• rapid homogeneity: proportion of rapid progressors covered by the pattern,
• slow homogeneity: proportion of slow progressors covered by the pattern,
• degree: number of conditions appear in the description of the pattern of the C4.5 classification model are given in Table 6.
3.2 Validation of Combinatorial Patterns
We remark that the C4.5 classification model given in Table 5 consists of explicit patterns, where the four clinical features and two SNPs selected in Identification of Significant Clinical and Genomic Features are assigned threshold values. Note that patterns S1-S7 exhibit high homogeneity for the slow progressors and R1-R8 exhibit high homogeneity for the rapid progressors in AASK data. For example, patterns S2, S3, S5, S7 have
As for the prevalence, patterns S4 and R8 are significant patterns, S4 covering
Based on the 10 × 10-folding cross-validation experiments, the classification model correctly classifies
Thus, we can conclude that the combinatorial patterns forming the classification model in Table 5 are high quality decision rules that can be easily interpreted by medical experts, allowing them to target the clinical features and SNPs associated with the progression of the renal disease to develop new therapies.
Data Availability Statement
The datasets generated for this study can be found in the African American Study of Kidney Disease and Hypertension Study (Clinical Trial) (AASK Trial) https://repository.niddk.nih.gov/studies/aask-trial/.
Author Contributions
ES, ML, and MMS are senior co-authors who designed and supervised the entire project and participated in writing the manuscript. MMM, TB, and MSM participated in the study design and performed the combinatorial analysis and participated in writing the manuscript. KC, EC, ZA, and RP were involved in various steps of the combinatorial analysis.
Funding
ML, ES, and MMS’s work was supported by National Institutes of Health—Grant number: 5R21DK67468. KC and EC’s work was supported by National Science Foundation (NSF) Research Experience for Undergraduates (REU) Grant—Award number: 1,359,341.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
Special thanks to the AASK Investigators.
Footnotes
1Chronic Kidney disease Surveillance Project, Center for disease Control and Prevention—http://nccd.cdc.gov/ckd/
References
Agodoa, L. Y., Appel, L., Bakris, G. L., Beck, G., Bourgoignie, J., Briggs, J. P., et al. (2001). Effect of ramipril vs amlodipine on renal outcomes in hypertensive nephrosclerosis: a randomized controlled trial. Jama 285, 2719–2728. doi:10.1001/jama.285.21.2719
Bakris, G. L., Weir, M. R., Shanifar, S., Zhang, Z., Douglas, J., van Dijk, D. J., et al. (2003). Effects of blood pressure level on progression of diabetic nephropathy: results from the RENAAL studyEffects of intensive blood-pressure lowering and low-dose aspirin in patients with hypertension: principal results of the Hypertension Optimal Treatment (HOT) randomised trial. Arch. Intern. Med. 163, 1555–1565. doi:10.1001/archinte.163.13.1555
Berg, U. (2006). Differences in decline in gfr with age between males and females. reference data on clearances of inulin and pah in potential kidney donors. Nephrol. Dial. Transplant. 21, 2577–2582. doi:10.1093/ndt/gfl227
Brenner, B. M., Cooper, M. E., de Zeeuw, D., Keane, W. F., Mitch, W. E., Parving, H. H., et al. (2001). Effects of losartan on renal and cardiovascular outcomes in patients with type 2 diabetes and nephropathy. N. Engl. J. Med. 345, 861–869. doi:10.1056/NEJMoa011161
Chen, E., Miller, G. E., Yu, T., and Brody, G. H. (2016). The Great Recession and health risks in African American youth. Brain Behav. Immun. 53, 234–241. doi:10.1016/j.bbi.2015.12.015
Contreras, G., Greene, T., Agodoa, L. Y., Cheek, D., Junco, G., Dowie, D., et al. (2005). Blood pressure control, drug therapy, and kidney disease. Hypertension. 46, 44–50. doi:10.1161/01.HYP.0000166746.04472.60
DCCT (1995). Effect of intensive therapy on the development and progression of diabetic nephropathy in the diabetes control and complications trial. The Diabetes Control and Complications (DCCT) Research Group. Kidney Int 47, 1703–1720.
Fogarty, D. G., Hanna, L. S., Wantman, M., Warram, J. H., Krolewski, A. S., and Rich, S. S. (2000). Segregation analysis of urinary albumin excretion in families with type 2 diabetes. Diabetes 49, 1057–1063. doi:10.2337/diabetes.49.6.1057
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutremann, P., and Witten, I. (2009). The WEKA data mining software: an update. SIGKDD Explorations 11 (1), 10–18. doi:10.1145/1656274.1656278
Hall, M. A., and Smith, L. A. (1998). Practical feature subset selection for machine learning. Springer.
Hallan, M. (1998). Calcium antagonists and renal disease. Kidney Int. 54, 1771–1784. doi:10.1046/j.1523-1755.1998.00168.x
Hallan, S. I., Coresh, J., Astor, B. C., Asberg, A., Powe, N. R., Romundstad, S., et al. (2006). International comparison of the relationship of chronic kidney disease prevalence and esrd risk. J. Am. Soc. Nephrol. 17, 2275–2284. doi:10.1681/ASN.2005121273
Hansson, L., Zanchetti, A., Carruthers, S. G., Dahlöf, B., Elmfeldt, D., Julius, S., et al. (1998). Effects of intensive blood-pressure lowering and low-dose aspirin in patients with hypertension: principal results of the Hypertension Optimal Treatment (HOT) randomised trial. HOT Study Group. Lancet 351, 1755–1762. doi:10.1016/s0140-6736(98)04311-6
Hebert, L. A., Kusek, J. W., Greene, T., Agodoa, L. Y., Jones, C. A., Levey, A. S., et al. (1997). Effects of blood pressure control on progressive renal disease in blacks and whites. modification of diet in renal disease study group. Hypertension 30, 428–435. doi:10.1161/01.hyp.30.3.428
Klag, M. J., Whelton, P. K., Randall, B. L., Neaton, J. D., Brancati, F. L., and Stamler, J. (1997). End-stage renal disease in African-American and white men. 16-year MRFIT findings. Jama 277, 1293–1298.
Klahr, S., Levey, A. S., Beck, G. J., Caggiula, A. W., Hunsicker, L., Kusek, J. W., et al. (1994). The effects of dietary protein restriction and blood-pressure control on the progression of chronic renal disease. Modification of Diet in Renal Disease Study Group. N. Engl. J. Med. 330, 877–884. doi:10.1056/NEJM199403313301301
Krolewski, A. S., Poznik, G. D., Placha, G., Canani, L., Dunn, J., Walker, W., et al. (2006). A genome-wide linkage scan for genes controlling variation in urinary albumin excretion in type II diabetes. Kidney Int. 69, 129–136. doi:10.1038/sj.ki.5000023
Lewis, E. J., Hunsicker, L. G., Bain, R. P., and Rohde, R. D. (1993). The effect of angiotensin-converting-enzyme inhibition on diabetic nephropathy. The Collaborative Study Group. N. Engl. J. Med. 329, 1456–1462. doi:10.1056/NEJM199311113292004
Lewis, E. J., Hunsicker, L. G., Clarke, W. R., Berl, T., Pohl, M. A., Lewis, J. B., et al. (2001). Renoprotective effect of the angiotensin-receptor antagonist irbesartan in patients with nephropathy due to type 2 diabetes. N. Engl. J. Med. 345, 851–860. doi:10.1056/NEJMoa011303
Lindeman, R. D., Tobin, J., and Shock, N. W. (1985). Longitudinal studies on the rate of decline in renal function with age. J. Am. Geriatr. Soc. 33, 278–285. doi:10.1111/j.1532-5415.1985.tb07117.x
Lindeman, R. (1990). Overview: renal physiology and pathophysiology of aging. Am. J. Kidney Dis. 16, 275–282. doi:10.1016/s0272-6386(12)80002-3
Murussi, M., Gross, J. L., and Silveiro, S. P. (2006). Glomerular filtration rate changes in normoalbuminuric and microalbuminuric Type 2 diabetic patients and normal individuals A 10-year follow-up. J. Diabetes Complicat. 20, 210–215. doi:10.1016/j.jdiacomp.2005.07.002
Niki, P., Panos, K., and Christos, C. (2015). New targets for end-stage chronic kidney disease therapy. J. Crit. Care Med. 1, 92–95. doi:10.1515/jccm-2015-0015
Parsa, A., Kao, W. H., Xie, D., Astor, B. C., Li, M., Hsu, C. Y., et al. (2013). APOL1 risk variants, race, and progression of chronic kidney disease. N. Engl. J. Med. 369, 2183–2196. doi:10.1056/NEJMoa1310345
Pohl, M. A., Blumenthal, S., Cordonnier, D. J., De Alvaro, F., Deferrari, G., Eisner, G., et al. (2005). Independent and additive impact of blood pressure control and angiotensin II receptor blockade on renal outcomes in the irbesartan diabetic nephropathy trial: clinical implications and limitations. J. Am. Soc. Nephrol. 16, 3027–3037. doi:10.1681/ASN.2004110919
Rahman, M., Greene, T., Phillips, R. A., Agodoa, L. Y., Bakris, G. L., Charleston, J., et al. (2013). A trial of 2 strategies to reduce nocturnal blood pressure in blacks with chronic kidney disease. Hypertension 61, 82–88. doi:10.1161/HYPERTENSIONAHA.112.200477
Ruggenenti, P., Perna, A., Gherardi, G., Garini, G., Zoccali, C., Salvadori, M., et al. (1999). Renoprotective properties of ace-inhibition in non-diabetic nephropathies with non-nephrotic proteinuria. Lancet 354, 359–364. doi:10.1016/S0140-6736(98)10363-X
Subasi, E., Subasi, M. M., Hammer, P. L., Roboz, J., Anbalagan, V., and Lipkowitz, M. S. (2017). A classification model to predict the rate of decline of kidney function. Front. Med. 4, 97. doi:10.3389/fmed.2017.00097
Wang, X., Lewis, J., Appel, L., Cheek, D., Contreras, G., Faulkner, M., et al. (2006). Validation of creatinine-based estimates of gfr when evaluating risk factors in longitudinal studies of kidney disease. J. Am. Soc. Nephrol. 17, 2900–2909. doi:10.1681/ASN.2005101106
Keywords: classification, genomic analysis, AASK, chronic kidney disease, decision trees
Citation: Moreno MM, Bain TC, Moreno MS, Carroll KC, Cunningham ER, Ashton Z, Poteau R, Subasi E, Lipkowitz M and Subasi MM (2021) Identifying Clinical and Genomic Features Associated With Chronic Kidney Disease. Front. Big Data 3:528828. doi: 10.3389/fdata.2020.528828
Received: 22 January 2020; Accepted: 30 October 2020;
Published: 14 January 2021.
Edited by:
Tuan D. Pham, Prince Mohammad bin Fahd University, Saudi ArabiaReviewed by:
Dinh Tuan Phan Le, New York City Health and Hospitals Corporation, United StatesLin Liu, Tsinghua University, China
Copyright © 2021 Moreno, Bain, Moreno, Carroll, Cunningham, Ashton, Poteau, Subasi, Lipkowitz and Subasi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Munevver Mine Subasi, bXN1YmFzaUBmaXQuZWR1