AUTHOR=Patel Jay S. , Su Chang , Tellez Marisol , Albandar Jasim M. , Rao Rishi , Iyer Vishnu , Shi Evan , Wu Huanmei TITLE=Developing and testing a prediction model for periodontal disease using machine learning and big electronic dental record data JOURNAL=Frontiers in Artificial Intelligence VOLUME=5 YEAR=2022 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2022.979525 DOI=10.3389/frai.2022.979525 ISSN=2624-8212 ABSTRACT=

Despite advances in periodontal disease (PD) research and periodontal treatments, 42% of the US population suffer from periodontitis. PD can be prevented if high-risk patients are identified early to provide preventive care. Prediction models can help assess risk for PD before initiation and progression; nevertheless, utilization of existing PD prediction models is seldom because of their suboptimal performance. This study aims to develop and test the PD prediction model using machine learning (ML) and electronic dental record (EDR) data that could provide large sample sizes and up-to-date information. A cohort of 27,138 dental patients and grouped PD diagnoses into: healthy control, mild PD, and severe PD was generated. The ML model (XGBoost) was trained (80% training data) and tested (20% testing data) with a total of 74 features extracted from the EDR. We used a five-fold cross-validation strategy to identify the optimal hyperparameters of the model for this one-vs.-all multi-class classification task. Our prediction model differentiated healthy patients vs. mild PD cases and mild PD vs. severe PD cases with an average area under the curve of 0.72. New associations and features compared to existing models were identified that include patient-level factors such as patient anxiety, chewing problems, speaking trouble, teeth grinding, alcohol consumption, injury to teeth, presence of removable partial dentures, self-image, recreational drugs (Heroin and Marijuana), medications affecting periodontium, and medical conditions such as osteoporosis, cancer, neurological conditions, infectious diseases, endocrine conditions, cardiovascular diseases, and gastroenterology conditions. This pilot study demonstrated promising results in predicting the risk of PD using ML and EDR data. The model may provide new information to the clinicians about the PD risks and the factors responsible for the disease progression to take preventive approaches. Further studies are warned to evaluate the prediction model's performance on the external dataset and determine its usability in clinical settings.