Background

AUTHOR=Kong Deming , Tao Ye , Xiao Haiyan , Xiong Huini , Wei Weizhong , Cai Miao 

TITLE=Predicting preterm birth using auto-ML frameworks: a large observational study using electronic inpatient discharge data

JOURNAL=Frontiers in Pediatrics

VOLUME=12

YEAR=2024

URL=https://www.frontiersin.org/journals/pediatrics/articles/10.3389/fped.2024.1330420

DOI=10.3389/fped.2024.1330420

ISSN=2296-2360

ABSTRACT=<sec><title>Background</title><p>To develop and compare different AutoML frameworks and machine learning models to predict premature birth.</p></sec><sec><title>Methods</title><p>The study used a large electronic medical record database to include 715,962 participants who had the principal diagnosis code of childbirth. Three Automatic Machine Learning (AutoML) were used to construct machine learning models including tree-based models, ensembled models, and deep neural networks on the training sample (<italic>N</italic> = 536,971). The area under the curve (AUC) and training times were used to assess the performance of the prediction models, and feature importance was computed via permutation-shuffling.</p></sec><sec><title>Results</title><p>The H2O AutoML framework had the highest median AUC of 0.846, followed by AutoGluon (median AUC: 0.840) and Auto-sklearn (median AUC: 0.820), and the median training time was the lowest for H2O AutoML (0.14 min), followed by AutoGluon (0.16 min) and Auto-sklearn (4.33 min). Among different types of machine learning models, the Gradient Boosting Machines (GBM) or Extreme Gradient Boosting (XGBoost), stacked ensemble, and random forrest models had better predictive performance, with median AUC scores being 0.846, 0.846, and 0.842, respectively. Important features related to preterm birth included premature rupture of membrane (PROM), incompetent cervix, occupation, and preeclampsia.</p></sec><sec><title>Conclusions</title><p>Our study highlights the potential of machine learning models in predicting the risk of preterm birth using readily available electronic medical record data, which have significant implications for improving prenatal care and outcomes.</p></sec>