AUTHOR=Shimpi Neel , Glurich Ingrid , Panny Aloksagar , Hegde Harshad , Scannapieco Frank A. , Acharya Amit TITLE=Identifying oral disease variables associated with pneumonia emergence by application of machine learning to integrated medical and dental big data to inform eHealth approaches JOURNAL=Frontiers in Dental Medicine VOLUME=3 YEAR=2022 URL=https://www.frontiersin.org/journals/dental-medicine/articles/10.3389/fdmed.2022.1005140 DOI=10.3389/fdmed.2022.1005140 ISSN=2673-4915 ABSTRACT=Background

The objective of this study was to build models that define variables contributing to pneumonia risk by applying supervised Machine Learning (ML) to medical and oral disease data to define key risk variables contributing to pneumonia emergence for any pneumonia/pneumonia subtypes.

Methods

Retrospective medical and dental data were retrieved from the Marshfield Clinic Health System's data warehouse and the integrated electronic medical-dental health records (iEHR). Retrieved data were preprocessed prior to conducting analyses and included matching of cases to controls by (a) race/ethnicity and (b) 1:1 Case: Control ratio. Variables with >30% missing data were excluded from analysis. Datasets were divided into four subsets: (1) All Pneumonia (all cases and controls); (2) community (CAP)/healthcare-associated (HCAP) pneumonias; (3) ventilator-associated (VAP)/hospital-acquired (HAP) pneumonias; and (4) aspiration pneumonia (AP). Performance of five algorithms was compared across the four subsets: Naïve Bayes, Logistic Regression, Support Vector Machine (SVM), Multi Layer Perceptron (MLP), and Random Forests. Feature (input variables) selection and 10-fold cross validation was performed on all the datasets. An evaluation set (10%) was extracted from the subsets for further validation. Model performance was evaluated in terms of total accuracy, sensitivity, specificity, F-measure, Mathews-correlation-coefficient, and area under receiver operating characteristic curve (AUC).

Results

In total, 6,034 records (cases and controls) met eligibility for inclusion in the main dataset. After feature selection, the variables retained in the subsets were: All Pneumonia (n = 29 variables), CAP-HCAP (n = 26 variables), VAP-HAP (n = 40 variables), and AP (n = 37 variables). Variables retained (n = 22) were common across all four pneumonia subsets. Of these, the number of missing teeth, periodontal status, periodontal pocket depth more than 5 mm, and number of restored teeth contributed to all the subsets and were retained in the model. MLP outperformed other predictive models for All Pneumonia, CAP-HCAP, and AP subsets, while SVM outperformed other models in VAP-HAP subset.

Conclusion

This study validates previously described associations between poor oral health and pneumonia. Benefits of an integrated medical-dental record and care delivery environment for modeling pneumonia risk are highlighted. Based on findings, risk score development could inform referrals and follow-up in integrated healthcare delivery environments and coordinated patient management.