Developing and testing a prediction model for periodontal disease using machine learning and big electronic dental record data

Patel, Jay S.; Su, Chang; Tellez, Marisol; Albandar, Jasim M.; Rao, Rishi; Iyer, Vishnu; Shi, Evan; Wu, Huanmei

doi:10.3389/frai.2022.979525

ORIGINAL RESEARCH article

Front. Artif. Intell. , 13 October 2022

Sec. Medicine and Public Health

Volume 5 - 2022 | https://doi.org/10.3389/frai.2022.979525

This article is part of the Research Topic Insights in AI: Medicine and Public Health 2022 View all 7 articles

Developing and testing a prediction model for periodontal disease using machine learning and big electronic dental record data

$\nJay S. Patel,*&#x;$ Jay S. Patel^1,2^*^†

Chang Su¹^†

Marisol Tellez²

Jasim M. Albandar³

Rishi Rao¹

Vishnu Iyer¹

Evan Shi¹

Huanmei Wu¹

¹Health Informatics, Department of Health Services Administrations and Policy, College of Public Health, Temple University, Philadelphia, PA, United States
²Department of Oral Health Sciences, Kornberg School of Dentistry, Temple University, Philadelphia, PA, United States
³Department of Periodontology and Oral Implantology, Kornberg School of Dentistry, Temple University, Pennsylvania, PA, United States

Despite advances in periodontal disease (PD) research and periodontal treatments, 42% of the US population suffer from periodontitis. PD can be prevented if high-risk patients are identified early to provide preventive care. Prediction models can help assess risk for PD before initiation and progression; nevertheless, utilization of existing PD prediction models is seldom because of their suboptimal performance. This study aims to develop and test the PD prediction model using machine learning (ML) and electronic dental record (EDR) data that could provide large sample sizes and up-to-date information. A cohort of 27,138 dental patients and grouped PD diagnoses into: healthy control, mild PD, and severe PD was generated. The ML model (XGBoost) was trained (80% training data) and tested (20% testing data) with a total of 74 features extracted from the EDR. We used a five-fold cross-validation strategy to identify the optimal hyperparameters of the model for this one-vs.-all multi-class classification task. Our prediction model differentiated healthy patients vs. mild PD cases and mild PD vs. severe PD cases with an average area under the curve of 0.72. New associations and features compared to existing models were identified that include patient-level factors such as patient anxiety, chewing problems, speaking trouble, teeth grinding, alcohol consumption, injury to teeth, presence of removable partial dentures, self-image, recreational drugs (Heroin and Marijuana), medications affecting periodontium, and medical conditions such as osteoporosis, cancer, neurological conditions, infectious diseases, endocrine conditions, cardiovascular diseases, and gastroenterology conditions. This pilot study demonstrated promising results in predicting the risk of PD using ML and EDR data. The model may provide new information to the clinicians about the PD risks and the factors responsible for the disease progression to take preventive approaches. Further studies are warned to evaluate the prediction model's performance on the external dataset and determine its usability in clinical settings.

Introduction

Despite advances in periodontal disease (PD) research and periodontal treatments, 42% of the US population have PD, which has led to tooth loss, poor quality of life, and increased healthcare cost (Eke et al., 2016, 2018). To date, limited studies show the effectiveness of current periodontal treatments in preventing disease progression and tooth loss based on patient characteristics (Pihlstrom et al., 1983; Farooqi et al., 2015). A major barrier is the difficulty of conducting randomized controlled trials with adequate numbers of patients over a long time because of several reasons, such as ethical reasons, expenses, and difficulty in enrollment and retaining patients for a longer time (Song et al., 2013; Thyvalikakath et al., 2020). Moreover, it is also well-studied that the PD can be prevented if the risk factors responsible for PD progression could be controlled by assessing patients' disease risk (Grossi et al., 1995; Genco and Borgnakke, 2013; Garcia et al., 2016). As a result, prediction models to assess patient-specific disease risk have been developed (Lang and Tonetti, 2003; Page et al., 2004; Chandra, 2007; Trombelli et al., 2009, 2017; Koshi et al., 2012; Meyer-Bäumer et al., 2012; Lang et al., 2015; Shimpi et al., 2019). However, the use of these tools in dental clinics is limited (Sai Sujai et al., 2015; Thyvalikakath et al., 2018) due to their suboptimal performance.

Researchers worldwide have developed risk assessment and predictions models to assess the risk of periodontitis (Lang and Tonetti, 2003; Persson et al., 2003; Page et al., 2004; Chandra, 2007; Trombelli et al., 2009; Meyer-Bäumer et al., 2012; Lang et al., 2015; Mullins et al., 2016; Shimpi et al., 2019). Typically, these risk assessment tools provide patients' disease risk into low, moderate, or high risks. Studies have demonstrated that these models have helped improve the documentation of patient-specific periodontal information. Moreover, some risk assessment models also provide evidence-based treatment recommendations, which eliminates the use of paper-based treatment recommendations guidelines. Despite these advances and advantages of using these tools for patient care, the use of these tools in clinics is seldom because for several reasons (Meyer-Bäumer et al., 2012; Alonso et al., 2013; Dhulipalla et al., 2015; Sai Sujai et al., 2015; Petersson et al., 2016). The existing tools are (1) not providing new information to the clinicians that they do not know, (2) not providing patient-specific disease risk because the same patient's risk scores were significantly different by different tools, and (3) the development of the tools are based on the evidence generated years or decades ago that may not represent the current patient population.

The increased availability of longitudinal patient care data electronically through the electronic dental record (EDR) offers an opportunity to characterize the present patient population's demographics, disease profiles to develop prediction models with up-to-date information (Song et al., 2013; Patel et al., 2018; Thyvalikakath et al., 2020). Moreover, advanced machine learning (ML) methods provide us with an opportunity to develop sophisticated data-driven models for PD. In medicine, ample studies have demonstrated the use of big EHR data and advanced ML methods to predict the risk for diseases. For instance, Simon et al. (2018) developed a prediction model for suicide attempts and suicide deaths using electronic health record (EHR) data. The authors found the prediction of suicide attempt and suicide death were 0.851 (C-statistics) and 0.861, respectively (Simon et al., 2018). Similarly, Tomašev et al. (2019) developed a prediction model to predict future acute kidney injury and found promising results. Their model predicts 55.8% of all inpatient episodes of acute kidney injury and 90.2% of all acute kidney injuries requiring subsequent dialysis administration (Tomašev et al., 2019).

Nevertheless, in dentistry, very few studies have utilized EDR data to develop a prediction model to predict the risk of periodontitis. Thyvalikakath Thankam et al. (2015), Hegde et al. (2019), and Shimpi et al. (2019) have utilized EDR data to predict the risk of periodontitis. These studies provided promising results; however, the lack of involvement of patients' social determinants of health and systemic health risk factors restricted the model's use. Moreover, periodontal findings such as bone loss and deeper periodontal pockets are apparent risk factors that clinicians already know. Clinicians would be rather interested to know the driving risk factors for periodontitis so that they can take preventive approaches.

Therefore, the objective of this study was to develop a data-driven prediction model for PD using an advanced ML model, XGBoost, to identify novel risk factors for prediction. In this data-driven model, the ML model decided the feature of importance rather than pre-assigning risk factors based on the experts' opinions or literature like in other studies. The results of this study would (1) determine the feasibility of using the big EDR and ML model to predict the risk of PD, (2) find new associations between systemic diseases, social and behavioral factors, and PD, and (3) provide pilot data to build a clinical decision support system to predict the risk of PD in a clinical setting.

Methods

This study was reviewed and approved by our institutional review board (Temple University Institutional Review Board Number: 28,321). In this retrospective study, the EDR data of patients who received at least one comprehensive oral examination (COE) in the Temple University Kornberg School of Dentistry (TUKSoD) predoctoral clinics were used. A cohort of 27,138 dental patients was generated and their demographics, medical history, periodontal findings, treatment information, and other 70 variables from the big EDR data were retrieved. The automated approaches to phenotype PD diagnoses and to retrieve patients' medical histories, social determinants of health, and behavioral habits from the free-text EDR data were generated. Last, a ML model was applied to predict the risk of periodontitis using more than 74 variables. The overall workflow of our methods is presented in Figure 1.

FIGURE 1

Figure 1. Pipeline of building the predictive model. The whole dataset was first split into 80% training and 20% testing set. The training set was used for model training, i.e., finding optimal hyperparameters that can achieve best prediction performance. We used a five-fold cross-validation strategy. Specifically, the training set was split into five-folds with equal size. Each time, a fold was used as the hold-out validation set to calculate model performance, while the remaining four-folds were used to train model. The procedure was repeated five times to make sure each fold was used as the hold-out validation set once. After identified optimal hyperparameters based on which the model can achieve the best prediction performance, we retrained the predictive mode using the whole training set. Finally, model performance was evaluated in the testing set.

Study cohort

EDR (axiUm^®, Exan software, Las Vegas, Nevada, USA) data of patients who received at least one COE at TUKSoD between January 1, 2017 and August 31, 2021, was used (n = 27,138 patients). Patient exclusion criteria include (1) patients without PD assessment information provided by clinicians and (2) patients with a missing rate of over 65% of candidate predictors of interest (Figure 2). After handling the missing values and diagnoses, the final dataset consisted of 18,553 unique patients (see Table 1). Patients with PD diagnoses were grouped into three categories: (1) healthy control (HC), (2) mild PD, and (3) severe PD (see Table 2). These patient data were split into 80% training (n = 14,842) and 20% testing (n = 3,711) sets. The training set was used for model training, and the testing set was used for model evaluation.

FIGURE 2

Figure 2. Model performance. (A) Receiver operator characteristic (ROC) curves of the three “one-vs.-rest” base predictive models and the average ROC curve of the combined multi-label predictive model. (B) Confusion Matrix of the predictive model. Color density indicates rate.

TABLE 1

Table 1. Characteristics of the studied cohort.

TABLE 2

Table 2. Class label definition.

Developing and testing automated computer applications to phenotype PD diagnoses and medical history information

As EDR is intended to support patient care and not research, patients' clinical information may not be readily available in an analyzable format. For example, patients' PD diagnoses may not be reported using diagnosis codes, as dentists get reimbursed through procedures and not diagnoses. Therefore, two computational approaches were developed that automatically provide PD diagnoses from periodontal charting data and clinical notes per the 2017 American Academy of Periodontology (AAP) classifications. Similarly, patients' medical histories and behavior factor information were extracted from free-text format using natural language processing (NLP) algorithms. Details on developing and testing these computer applications are described elsewhere (Patel et al., 2022).

Missing value imputation

One of the challenges with using EDR data for research is the missing data, as the EDR data has not been collected for research. Missingness of variables can be found in Table 3. Variables with high missing values were excluded as candidate predictors, and the missing values were imputed according to value types. Specifically, missing values of categorical candidate predictors, such as medical histories, were imputed using the most frequent values. While for the continuous candidate predictors, such as age and teeth number, missing values were imputed using the median values to avoid skewed distribution of the imputed data. Of note, to avoid information leaking, missing values of the testing set were imputed using imputation values in the training set.

TABLE 3

Table 3. Missingness of the studied variables.

Model training

The predictive model was based on the XGBoost algorithm (Chen and Guestrin, 2016). XGBoost is an advanced powerful ML model, which combines multiple decision trees to construct a strong model. XGboost has been used in health data analysis tasks and achieved remarkable performances (Khera et al., 2021; Pieszko and Slomka, 2021; Fang et al., 2022) As this is a multi-class classification task (having three classes, i.e., healthy control, mild PD, and severe PD to predict), a one-vs.-all strategy was used. Specifically, the predictive model was the composition of three base models, one predicting HCs from all PDs, one predicting mild PDs from HCs and severe PDs, and one predicting severe PDs from HCs and mild PDs. The rationale of classifying patients into healthy control, mild, and severe is to group the categories on a higher level. For example, our automated phenotyping algorithm (Patel et al., 2022) automatically classifies patients into healthy periodontium, healthy but reduced periodontium, gingivitis classes, and Stage I–Stage IV periodontitis classes with Grade A–C. Our automated algorithm also determined the extent of periodontitis, such as localized and generalized. This includes a total of 24 distinct PD diagnosis categories, as suggested by the 2017 AAP classification system. However, for ML to distinguish all these 24 categories and predict the risk would provide very low accuracy because of “too much variability” in the risk profiles. Moreover, most importantly, clinically, it may not be very relevant at the chairside to predict the risk in one of the 24 categories because the close resembles many of the categories and their treatment planning. For instance, localized gingivitis and generalized gingivitis treatments would be fairly similar, so those categories could be consolidated into one category to reduce “variability” for the ML model to provide optimal performance.

The predictive model was trained using the training set of 18,553 patients (see Figures 1, 2). In order to determine the optimal hyperparameters of the XGBoost, a five-fold cross-validation strategy was used (see Figure 2). Specifically, the training set was split into five-folds with equal size. Each time, a fold was used as the hold-out validation set to calculate model performance, while the remaining four-folds were used to train the model. The procedure was repeated five times to make sure each fold was used as the hold-out validation set once. The optimal hyperparameters can be determined by the best averaged prediction performance in the five-fold validation sets within the training set. Table 4 lists the optimal hyperparameters we achieved after model training. After that, the model was retrained using the whole training set (see Figure 2).

TABLE 4

Table 4. Optimal hyperparameters of the trained predictive model (XGBoost).

Model evaluation

Model evaluation was performed on the testing set. Prediction performance was measured by the area under the receiver operating characteristics curve (AUC) and the confusion matrix to compute prediction accuracy, precision, recall, and F-1 measure. The training and testing sets were randomly split by the model; therefore, the cohort statistics of the training and testing data sets are not different.

Predictor contribution interpretation

To enhance model interpretability, the SHapley Additive exPlanation (SHAP) was utilized, a well-designed tool which is able to interpret output (e.g., decision) of any ML models (Lundberg et al., 2017). The values of each predictor to assess their contributions in distinguishing each class from the others was calculated.

Results

Patient demographics and medical history of dental patients

Our sample consisted of 27,138 unique dental patients who received at least one COE between January 1, 2017 and August 31, 2021. Our patients' most common age group was 58–67 years (19%), followed by 48–57 years (18%). African American was the most frequent race (28%), followed by Whites (12%). The majority of our patients were females (57%). Thirty-seven (n = 10,148) percent of our dental patients had at least one cardiovascular disease condition. Five percent (n = 1,476) of our patients had at least one Nephrology condition, 19% (n = 5,135) had at least one Neurology condition, 9% (n = 2,400) had Hematology and Oncology conditions, and 18% (n = 4,973) had Rheumatology conditions (see Table 1). This summary report was generated before excluding patients with extremely high missing data.

Social and behavioral habits, DMFT, DMFS, teeth and plaque index scores

Nineteen percent (n = 5,022) of our patients smoked cigarettes and 18% (n = 4,979) drank alcohol. Eleven percent (n = 2,901) of patients use at least one recreational drug, such as cocaine, marijuana, or methamphetamine. The mean Decay Missing Filled Teeth (DMFT) index was 6.06. The mean Decay Missing Filled Surface (DMFS) index was 16.74. Our patients had an average of 22 teeth (presence of teeth) (SD = 8.82, CI = 0.10). The average plaque index score was 73.36. It was discovered that 23% (n = 6,357) of patients do not comply with regular brushing, flossing, and use of the mouth wash. Twenty percent (n = 5,417) of patients mentioned inadequate home plaque control, 16% (n = 4,272) had teeth mobility, and 10% (n = 2,639) had defective restoration. Six percent (n = 1,618) of our patients reported having high-stress levels, 5% (n = 1,381) had bruxism, 4% (n = 1,120) took medications that affected their periodontium, and 3% (n = 866) had tooth crowding (see Table 1).

Performance of the prediction model

As demonstrated in the Methods, section, the prediction model was built on the final dataset of 18,553 patients (after excluding patients with high missing data). We achieved an AUC of 0.72 (weighted average of three base models) in distinguishing HC, mild PD, and severe PD from each other (see Figure 3; Table 5). When looking into the “one-vs.-rest” base models, the models work well in distinguishing HCs from PDs (AUC = 0.69, F1-score = 0.66) as well as in distinguishing the severe PDs from HCs and mild PDs (AUC = 0.71, F1-score = 0.30) (see Table 5).

FIGURE 3

Figure 3. Data extraction, cleaning, preprocessing, and machine learning model training and teasing (Overall workflow).

TABLE 5

Table 5. Prediction performance of the model.

Risk factors and features identified using a data-driven approach

New associations and novel features that include social determinants of health, medical conditions, patients' oral health habits, and patients' overall health (see Figure 4; Table 6). We categorized these risk factors into modifiable and non-modifiable risk factors. The rationale for categorizing these risk factors into these two groups is that clinicians would only be interested in knowing the modifiable risk factors to take preventive approaches and provide patient education at the chairside. The non-modifiable risk factors include patients' age, gender, race, insurance status, pregnancy, number of teeth present, periodontal bone loss, periodontal attachment loss, and furcation involvement. For example, older patients are more likely to suffer from periodontitis than younger individuals. Similarly, the Black race and patients without dental insurance are more likely to suffer from periodontitis. Some of the periodontal findings, such as bone loss and furcation involvement were purposefully categorized in the non-modifiable risk factors category because even though we used these factors toward prediction, the condition is irreversible, and therefore, clinicians have no control over these factors.

FIGURE 4

Figure 4. Predictor contributions. Contributions of top 25 predictors in predicting HC from PDs (A), predicting Mild PD from others (HC and Severe PD) (B), and predicting Severe PD from others (HC and Mild PD) (C). For the left column, each dot indicates a sample, color density of the dot indicates normalized value of the specific feature of the specific sample, and horizontal axis indicates SHAP values. High positive SHAP value indicates that the specific feature value has a high positive contribution on prediction, and vice versa. For instance, smaller values of number of teeth present are strong indicator of Severe PD. For the right column, importance of each predictor was calculated as the mean absolute SHAP value of the predictor.

TABLE 6

Table 6. Details of predictors of each base one-vs.-rest model.

The modifiable risk factors were further classified into four subcategories: (1) Social determinants of health, (2) Social habits, (3) Oral health habits, and (4) Systemic diseases. Patients that belonged to the following groups had a higher incidence of periodontitis than the HC group. Patients who lived at a farther distance from the dental school had higher risk of PD. Similarly, smokers, who drank alcohol, patients who aren't compliant with brushing and flossing also had higher incidence of PD. Patients with certain dental conditions such as crowded teeth, teeth grinding, dental anxiety, speaking trouble, self-image issues, defective restorations, abnormal tooth anatomy, presence of removable partial dentures, higher DMFT, and DMFS index, a higher number of carious teeth, past history of periodontal treatments had higher rates of PD. Finally, patients who had multiple systemic conditions were also at advanced risk of PD.

Discussion

This study demonstrated the successful creation of a prediction of PD with moderate to high accuracy (72% average AUC). This is the first study that used 74 features to develop and test PD prediction models. Novel features, including social determinants of health, social habits, oral health habits, and systemic diseases were discovered. This new information would provide clinicians with identifying high-risk PD patients to take preventive approaches for disease prevention with the hope of reducing disease prevalence, reducing healthcare costs, and improving quality of life.

A few studies in dentistry attempted to develop prediction models for PD using the EDR data. Shimpi et al. (2019) developed a prediction model that could be applicable at point-of-care using supervised ML methods. The author compared the performances of ML models and found that Decision Tree and Artificial Neural Network provided higher accuracy (87%) in classifying patients with high or low PD risks. Similarly, Thyvalikakath Thankam et al. (2015) and Hegde et al. (2019) have utilized EDR data to predict the risk of periodontitis. The accuracy with automated data was 92%, the recall was 73%, and precision was 93%. Compared to these studies, our model performed slightly inferior [precision (0.50), recall (0.53)]; however, these studies were restricted by the number of risk factors. We studied a wider range of variables in EDR that were collected in daily patient care. Moreover, instead of predicting PDs and HCs, this study proposed to predict HCs, mild PDs, and severe PDs, which made our task more difficult but more useful for the clinicians. In this study, the data decided (unlike expert driven models) which risk factors have a stronger influence on the PD than the traditional risk factors; especially, there is a significant shift in the prevalence of risk factors.

As demonstrated in section “Risk factors and features identified using a data-driven approach,” new features and modifiable risk factors were discovered that could drive the periodontitis risk. For example, patients who reported using recreational drugs, smoking, and alcohol are at high risk of PD, so dental clinicians can provide counseling programs to these patients. Many tobacco cessation programs are available for patients who smoke tobacco, which could be referred by dental clinicians (Myers Virtue et al., 2017). Next, it was discovered that patients who live farther away had a higher rate of periodontitis, which could be due to the lack of available, affordable dental care in remote areas. Most of our patients come from under represented and lower socioeconomic areas, so they may not afford to go to private dental clinics for their treatments. This information would help us determine the strategies for zip code locations to conduct dental camps and outreach dental programs to help these patients get regular dental and preventive treatments. Next, patients' oral health factors include inadequate brushing and flossing, crowded teeth, teeth grinding, dental anxiety, and defective restorations. Patients with defective restorations should be replaced as early as possible upon identification, and the teeth grinding can be managed by providing mouth guards to the patients. Dental anxiety and other mental conditions can be managed by providing patient counseling or referring patients to specialists. In summary, our approach would allow dental clinicians to provide more holistic care than just dental care. It is also important to note that some risk factors, such as the presence of removable partial dentures (RPDs), could result from high dental caries (the primary reason for losing teeth). It is studied that a higher incidence of dental caries is associated with a higher number of RPDs. However, in this study, we found that patients who had RPDs also had a higher incidence of severe periodontitis. This link could be due to the association between dental caries and periodontitis.

This study encountered some limitations. First, the prediction model was developed only using one institute's dataset; therefore, our study results may not be generalizable. Next, only one ML algorithm was tested, which may have provided superior performance than XGBoost. Behavior factors such as smoking, alcohol, and the use of recreational drugs are self-reported information. Self-reported information may or may not be reliable, which may have an impact on the performance of our prediction model. Only simple imputation methods were utilized to impute missing data. Next, a retrospective study of electronic chart review like this provided information about possible associations between risk factors and PD, however, we did not identify possible causations. To understand the caustic relations, longitudinal studies are warned. In future studies, we will analyze longitudinal EDR data to identify clinical course of periodontitis. We will also test the performance of other more sophisticated algorithms and then test the performance of our prediction model on the external dataset. In addition, a usability study will also be performed with dental clinicians to determine their attitudes, perception, and opinions about the prediction model and its use in real dental clinical settings. Last, a clinical decision support system that can be implemented in EDR to predict the risk of periodontitis at TUKSoD clinics for patient care will be developed. Other data sources such as geographic information systems to identify factors, socioeconomic factors, and criminal history will also be utilized to add more risk factors to this prediction model.

Conclusion

Our pilot study demonstrated promising results in utilizing the EDR and ML model to predict the risk of PD. This model would provide new information to the clinicians about the PD risks and the factors responsible for the disease progression to take preventive approaches. Further studies are warned to evaluate the performance of our prediction model on the external dataset and determine its usability in clinical settings.

Data availability statement

The datasets presented in this article are not readily available as they may contain patient identifiable information. In case any researchers want to collaborate and utilize this dataset, a data use agreement and Internal Review Board processes are mandatory. Requests to access the datasets should be directed to JP at cGF0ZWwuamF5QHRlbXBsZS5lZHU=.

Ethics statement

The studies involving human participants were reviewed and approved by Temple University Internal Review Board (IRB#28321). The patients/participants provided their written informed consent to participate in this study.

Author contributions

JP contributed to the conception, design, data acquisition, analysis and interpretation, drafted, critically revised the manuscript, gave full final approval, and agreed to be accountable for all aspects of the work. CS contributed to data analysis and interpretation, revised the manuscript, gave full final approval, and agreed to be accountable for all aspects of the work. MT and JA contributed to conception and interpretation, drafted and critically revised the manuscript, gave full final approval, and agreed to be accountable for all aspects of the work. HW contributed to conception, design, analysis, interpretation, critically revised the manuscript, gave full final approval, and agreed to be accountable for all aspects of the work. RR, VI, and ES student workers helped significantly with data cleaning. All authors contributed to the article and approved the submitted version.

Funding

This study was funded through start-up funds of the senior author HW.

Acknowledgments

We would like to acknowledge Ryan Brandon for helping us obtain datasets from TUKSoD.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Alonso, A., Krijthe, B. P., Aspelund, T., Stepas, K. A., Pencina, M. J., Moser, C. B., et al. (2013). Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. J. Am. Heart Assoc. 2:e000102. doi: 10.1161/JAHA.112.000102

Developing and testing a prediction model for periodontal disease using machine learning and big electronic dental record data

Introduction

Methods

Study cohort

Developing and testing automated computer applications to phenotype PD diagnoses and medical history information

Missing value imputation

Model training

Model evaluation

Predictor contribution interpretation

Results

Patient demographics and medical history of dental patients

Social and behavioral habits, DMFT, DMFS, teeth and plaque index scores

Performance of the prediction model

Risk factors and features identified using a data-driven approach

Discussion

Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher's note

References

95% of researchers rate our articles as excellent or good