Development and evaluation of a model for predicting the risk of healthcare-associated infections in patients admitted to intensive care units

Wang, Jin; Wang, Gan; Wang, Yujie; Wang, Yun

doi:10.3389/fpubh.2024.1444176

ORIGINAL RESEARCH article

Front. Public Health , 12 September 2024

Sec. Infectious Diseases: Epidemiology and Prevention

Volume 12 - 2024 | https://doi.org/10.3389/fpubh.2024.1444176

This article is part of the Research Topic Outbreak Investigations of Nosocomial Infections View all 13 articles

Development and evaluation of a model for predicting the risk of healthcare-associated infections in patients admitted to intensive care units

Jin Wang¹^†

Gan Wang^2,3^*^†

Yujie Wang⁴

Yun Wang⁵

¹Department of Healthcare-Associated Infection Management, Qingdao Municipal Hospital, University of Health and Rehabilitation Sciences, Qingdao, China
²Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China
³School of Public Health, Fudan University, Shanghai, China
⁴Department of Clinical Laboratory, Qingdao Municipal Hospital, University of Health and Rehabilitation Sciences, Qingdao, China
⁵Emergency Intensive Care Unit, Qingdao Municipal Hospital, University of Health and Rehabilitation Sciences, Qingdao, China

This retrospective study used 10 machine learning algorithms to predict the risk of healthcare-associated infections (HAIs) in patients admitted to intensive care units (ICUs). A total of 2,517 patients treated in the ICU of a tertiary hospital in China from January 2019 to December 2023 were included, of whom 455 (18.1%) developed an HAI. Data on 32 potential risk factors for infection were considered, of which 18 factors that were statistically significant on single-factor analysis were used to develop a machine learning prediction model using the synthetic minority oversampling technique (SMOTE). The main HAIs were respiratory tract infections (28.7%) and ventilator-associated pneumonia (25.0%), and were predominantly caused by gram-negative bacteria (78.8%). The CatBoost model showed good predictive performance (area under the curve: 0.944, and sensitivity 0.872). The 10 most important predictors of HAIs in this model were the Penetration Aspiration Scale score, Braden score, high total bilirubin level, female, high white blood cell count, Caprini Risk Score, Nutritional Risk Screening 2002 score, low eosinophil count, medium white blood cell count, and the Glasgow Coma Scale score. The CatBoost model accurately predicted the occurrence of HAIs and could be used in clinical practice.

1 Introduction

The prevalence of healthcare-associated infections (HAIs) in intensive care units (ICUs) remains high owing to the use of invasive procedures (1), prolonged antibiotic use, and critical patient conditions, having a negative impact on quality of life and healthcare costs (2–5). Early identification and quantification of the risk of HAIs, coupled with timely interventions, are critical for reducing the incidence of HAIs and improving patient outcomes (6). Machine learning (ML) technology has emerged as a useful tool among healthcare professionals for predicting prognosis and making treatment decisions. Use of ML applications in ICU research has shown a 22.2% improvement in predicting complications (7). Although existing research often relies on hospitalization data for infection prediction, early initiation of infection control measures is important to improving patient outcomes (8–10). This study aimed to use indicators collected within 24 h of admission to the ICU using ML technology to develop infection prediction models with an earlier predictive window. Notably, the enhanced predictability of ML models, albeit at the expense of interpretability, makes it a reliable tool for predicting specific medical conditions.

Previous studies have used ML techniques to refine risk prediction in the healthcare context. Within ICUs, the Simplified Acute Physiology Score (SAPS) II has shown moderate accuracy for predicting HAIs and 7-day mortality (11). However, combining SAPS II with additional patient characteristics using support vector machine algorithms increased the predictive accuracy for both outcomes (12). In the neonatal ICU, a generalized mixed-effects regression tree with random intercept (GMERT-RI) model identified key predictors of HAIs, highlighting the importance of healthcare-associated factors in addition to neonate-specific factors (13). These findings highlight the potential of ML models to refine risk prognostication in critical care environments and facilitate targeted interventions to improve patient outcomes.

Early identification of HAI risk, enables early intervention; however, relatively few studies have focused on predicting HAIs using data collected within 24 h of ICU admission. This retrospective study used data from patients admitted to the ICU, focusing on the clinical characteristics of patients who developed infections during hospitalization. Multiple ML techniques were used to develop infection prediction models, leading to the development of a comprehensive model for training. These models identified significant predictors and key determinants which can be used the help healthcare professionals in the early identification of infection risk among patients admitted to ICUs, facilitating timely intervention.

2 Methods

2.1 Study participants

The study participants were patients admitted to the ICU of a tertiary grade A comprehensive hospital in China between January 2019 and December 2023, with a total of 3,173 patients. We retrospectively analyzed the patients’ demographic characteristics, blood test results, nursing scores, and invasive clinical procedures within 24 h of admission. The data on patient demographics and invasive clinical procedures were extracted from the infection information monitoring system of Xinglin Hospital, blood markers were extracted from the Ruimei laboratory management system, and nursing scores were extracted from the intensive care information system.

The inclusion criteria were: 1. ICU admissions between January 2019 and December 2023; 2. Hospital stays exceeding 48 h; 3. The data within 24 h of ICU admission; 4. Infection-related data including occurrence, site, and pathogens throughout the hospitalization period. According to the “HAIs Diagnosis Criteria (Trial)” (14) in China, HAIs that occur in the hospital without a clear incubation period are classified as HAIs if they manifest 48 h or more after admission. Consequently, the 48-h timeframe serves as a critical criterion for identifying hospital infections. This study focused on patients who were admitted to the ICU for more than 48 h, in adherence to this guideline.

The exclusion criteria were: 1. Exclusion of colonization, community-acquired infections, and contamination in infection diagnoses (N = 21); 2. Exclusion of patients with disputed infection diagnoses (N = 5); 3. Exclusion of patients not discharged by December 31, 2023 (N = 25); 4. Data incompleteness within 24 h of ICU admission exceeds 5.0% (N = 23). Infection diagnosis adhered to the “HAIs Diagnosis Criteria (Trial)” (14) standard, considering the detection of the same pathogen at the same site in the same patient during hospitalization as one infection.

Following screening, the study encompassed a total of 2,517 cases, including 455 instances of HAIs.

2.2 Data collection

The data included 32 factors though to be relevant to predicting the infection risk among ICU patients. The patient characteristics included gender, age, and diabetes status. Blood test results included white blood cell (WBC) count, neutrophil count, monocyte count, lymphocyte count, eosinophil count, basophil count, red blood cell (RBC) count, hemoglobin level, RBC distribution width, hematocrit, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, platelet count, C-reactive protein level, glucose level, albumin level, and total bilirubin level. The nursing scores consisted of the Braden score, Nutritional Risk Screening (NRS) 2002 score, Glasgow Coma Scale (GCS) score, Critical Care Pain Observation Tool (CPOT), Penetration Aspiration Scale (PAS), enteral feeding tolerance, Caprini Risk Score (CRS), and unplanned extubation assessment (2, 15, 16). Invasive procedures included mechanical ventilation, intravascular catheterization, and urinary catheterization. The value of each indicator was determined based on the first test or assessment conducted within 24 h of ICU admission. The primary outcome was the occurrence of HAIs among patients admitted to the ICU.

2.3 Ethical approval

To respect and protect the legitimate rights and interests of the subjects, this research has been approved by the Ethics Committee of Qingdao Municipal Hospital (approval no: 2024-KY-020). The individual patient consent requirement is waived because the project does not affect clinical treatment of patients, poses no greater than minimal risk, and all protected health-sensitive information has been removed from the limited dataset used in this study.

2.4 Identification of pathogens on culture

In compliance with the specifications outlined in the WS/T 640—2018 Clinical Microbiology Examination: Sample Collection and Transportation (17), clinical specimens were collected for culture from patients with suspected HAIs. Bacteria were identified using fully automated microbial identification and drug sensitivity analysis systems such as VITEK 2 (bioMérieux, Marcy-L’Étoile, France) or matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS; Bruker Daltonics, GmbH, Bremen, Germany). In patients with more than one bacterial strain isolated from various specimen types during their ICU admission, only the first pathogen identified was included in the analysis.

2.5 Statistical analysis

During the data processing phase, the researchers performed data anonymization to ensure that evaluators could not access background information related to the sample results. The data for Neutrophil Count, glucose levels, albumin levels, total bilirubin levels, and C-Reactive Protein have some missing values; however, the missing data for each is below 5.0%. The researchers filled in these missing values using the mean of the respective columns. This study used chi-square tests and t-tests to assess the statistical significance of differences in categorical and continuous variables, respectively, to identify predictors of infection in patients with and without HAIs (18, 19), to mitigate the risk of overfitting and enhance the interpretability of the model. As hypothesis testing using chi-square tests and t-tests only assessed the statistical significance of differences between groups, and did not necessarily identify the best predictors, variable selection algorithms and cross-validation techniques were used for variable selection (20). Multiple methodologies were used to construct a prediction model that was both accurate and robust.

2.6 Model building and evaluation

Data processing and analysis were performed using Python pandas, and Python Scikit-learn was used for model construction. OneHotEncoder was used to transform categorical variables into one-hot encoding (21), and StandardScaler was used to standardize numerical variables and enhance model performance and convergence speed (22). This study used ColumnTransformer to combine two distinct variable-processing methods in ML. The dataset is divided into training and testing sets with an 80–20 split. This means 80.0% of the data is used for training the model, and 20.0% is reserved for testing the model. This study used 10 commonly used machine learning models: Random Forest (23), eXtreme Gradient Boosting (XGBoost) (24), Multilayer Perceptron (MLP), Adaptive Boosting (AdaBoost), Logistic Regression (25), Light Gradient Boosting Machine (LightGBM) (26), Decision Tree (27), Naive Bayes, Neural Networks (28), and Categorical Boosting (CatBoost) (29). The least absolute shrinkage and selection operator (LASSO) was used for variable selection (23). The predictive performance of each model was assessed using the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, and Youden’s index (30, 31). In this study, the optimal thresholds for the best model were determined using Youden’s J statistic in the test dataset. Additionally, all model output results were generated by an automated system, which further minimized the potential for human intervention, thereby enhancing the reliability of the evaluation results. The code has been shared on GitHub, please visit https://github.com/aarontroy/HAIs-prediction-model.git.

3 Results

3.1 Patient characteristics

A total of 2,517 patients, including 1,544 male and 973 female patients with a mean age of 66.191 ± 19.067 years, met the eligibility criteria and were included in the analysis. Of the patients, 706 had been diagnosed with diabetes. In this cohort, 455 patients (18.1%, 309 male and 146 female patients; mean age 70.160 ± 18.403 years) developed HAIs (Table 1). Comparison of the characteristics of patients with and without HAIs using chi-square tests and t-tests, identified 18 statistically significant variables including gender, WBC count, RBC count, lymphocyte count, hemoglobin level, total bilirubin level, RBC distribution width, hematocrit, mean corpuscular hemoglobin concentration, mean corpuscular volume, eosinophil count, age, Braden score, NRS 2002 score, GCS score, PAS, enteral feeding tolerance, and the CRS.

Table 1

Table 1. Baseline characteristics of patients.

3.2 Characteristics of healthcare-associated infections

3.2.1 Infection sites

Of the patients included in the analysis, 585 HAIs were identified in 455 patients, including 91 patients who experienced multiple episodes of HAI. HAIs were identified at 14 different sites. The top 5 infected sites were lower respiratory tract (28.7%, 168 cases), ventilator-associated pneumonia (25.0%, 146 cases), Bacteremia (19.3%, 113 cases), Catheter-associated urinary tract infection (9.6%, 56 cases) and Central line-associated bloodstream infection (5.5%, 32 cases) (Table 2).

Table 2

Table 2. Distribution of infected sites in patients in the hospital.

3.2.2 Bacterial pathogens identified

Among the 455 patients with HAIs, 396 bacterial pathogens were grown on culture, an 87.0% detection rate. A total of 640 strains were identified, comprising 504 (78.8%) gram-negative bacteria, 94 (14.7%) gram-positive bacteria, and 42 (6.6%) fungi. Of the patients, 217 had a single pathogen identified, and 179 were infected with multiple pathogens. The top five pathogens identified were Acinetobacter baumannii (25.5%, 163 cases), Klebsiella pneumoniae (17.0%, 109 cases), Pseudomonas aeruginosa (14.1%, 90 cases), Enterobacter cloacae (5.2%, 33 cases), and Escherichia coli (4.8%, 31 cases) (Table 3).

Table 3

Table 3. Distribution of pathogenic microorganisms in infected patients.

3.3 Synthetic minority oversampling technique

The synthetic minority oversampling technique (SMOTE) algorithm was used to inflate the number of patients with HAIs from 445 to 2062, to align the size of the HAI cohort with that of the non- HAI cohort, thereby effectively mitigating class imbalance. This study categorized the model predictions into two groups based on whether oversampling was conducted, thus facilitating a comparative evaluation of the predictive performance.

3.4 Variable selection and dimensionality reduction

The LASSO algorithm is a classic linear regression technique that is widely used for variable selection and dimensionality reduction. Compared with traditional linear regression methods, LASSO is capable of automatically selecting variables that have a marked effect on the target variable while preserving predictive accuracy (23). Consequently, this leads to a reduction in model complexity and an enhancement in generalization performance. In the analysis without SMOTE oversampling, the LASSO algorithm excluded the following variables: C-reactive protein, mean corpuscular hemoglobin concentration, GCS score, CPOT score, gastrointestinal tolerance to nutrition, and CRS. Conversely, in the analysis with SMOTE oversampling, the LASSO algorithm excluded gender, CRS, unplanned extubation assessment, CPOT score, diabetes, aspiration/asphyxia score, GCS score, and gastrointestinal tolerance to nutrition.

Typically, AUC is used as the primary metric for evaluating model performance owing to its comprehensive consideration of both true-positive and false-positive rates (Figure 1). According to the data provided, the Random Forest model demonstrated the best performance in terms of the AUC (0.958). However, given the 18.1% incidence of infection in the study hospital, sensitivity was considered the most important metric of model performance. Notably, the CatBoost model outperformed the others in terms of sensitivity (0.872) at detecting HAIs. In this context, the CatBoost model was considered the best choice (Table 4).

Figure 1

Figure 1. (A) The receiver-operating characteristic curves for ten ML models without undergoing SMOTE oversampling processing. (B) The ten-fold cross-validation results of the CatBoost model without SMOTE oversampling. (C) The receiver-operating characteristic curves for ten ML models with undergoing SMOTE oversampling processing. (D) The ten-fold cross-validation results of the CatBoost model with SMOTE oversampling.

Table 4

Table 4. Evaluation metrics computation of machine learning models without or with undergoing SMOTE oversampling.

We conducted ten-fold cross-validation using the CatBoost classifier, training the model using preprocessed data (X_processed). In each fold of cross-validation, data splitting was performed using the StratifiedKFold method with 10 folds (n_splits = 10), ensuring data shuffling (shuffle = True) and maintaining consistency by setting a random seed (random_state = 42). Subsequently, the model was fitted to each training set (train) and the probability values were predicted using the corresponding test set (test). The mean AUC was 0.947 ± 0.008. These results highlight the excellent predictive performance and stability of the CatBoost model after SMOTE oversampling. In this study, the optimal thresholds for the CatBoost model were determined using Youden’s J statistic in the test dataset. For the NO_SMOTE group, the optimal threshold was calculated to be 0.558. This threshold allows the model to achieve the best balance between sensitivity and specificity. In the SMOTE group, which addresses class imbalance issues, the optimal threshold was found to be 0.581. This higher threshold helps to further enhance the model’s performance on imbalanced datasets.

A model was developed to identify factors contributing to HAIs in ICU patients using the CatBoost method. Eighteen independent variables identified through single-factor analysis were used as inputs, and the occurrence of HAIs events was the dependent (outcome) variable. The SHapley Additive exPlanation (SHAP) values were used as a technique to elucidate model predictions, to assist with understanding of individual variable contributions to model predictions. The SHAP values of each selected variable was visualized using violin plots, to reveal the distribution of the impact of variable on the model output. The width of the violin shape denotes the distribution of the SHAP values for the variable; a broader shape signifies a wider range of effects and potentially more diverse effects of the selected predictor variables. In the CatBoost model, the top 10 predictors of HAIs and their SHAP values were PAS (0.470), Braden score (0.470), high total bilirubin level (0.465), female (0.414), high WBC count (0.387), CRS (0.345), NRS 2002 score (0.321), low eosinophil count (0.318), medium WBC count (0.284), and GCS score (0.269) (Figure 2).

Figure 2

Figure 2. SHapley Additive exPlanation summary plots.

4 Discussion

This study devised an algorithmic model for predicting the occurrence of HAIs in patients admitted to the ICU. CatBoost was used as the ML model and trained on the full dataset containing 32 variables and a dataset containing 18 selected variables. Subsequent enhancement of model performance was achieved by preprocessing the data using SMOTE oversampling. By using the SHAP method to interpret the findings, this study attained interpretability and transparency in the HAI outcomes predicted using ML at both the population and individual level. The use of ML to predict the occurrence of HAIs has potential application for early alerts and preemptive strategies, facilitating personalized treatment regimens and resource distribution, streamlining the clinical decision-making process and management protocols, and promoting advancements in the quality and safety of medical care.

Machine learning has gradually been applied to clinical sample analysis to handle variable data related to patient infection status (6). AI and ML technologies are expected to develop hospital infection monitoring algorithms to identify risk factors, improve patient risk stratification, track transmission pathways, and achieve real-time infection detection. Electronic health data plays a critical role in this process and is increasingly accessible. Advanced data management systems support real-time decision-making, aiding in automated hospital infection monitoring (32, 33). In the ICU, ML-supported clinical decision research focuses on monitoring, diagnosing, early identification of clinical events, outcome prediction, and prognosis assessment, to assist doctors, researchers, and policymakers in making treatment decisions. Currently, numerous ML models are used to predict risks of VAP, CLABSI, SSI, and hospital-acquired MDR pathogens, with particular emphasis on predicting sepsis and septic shock (6). In hospital infection risk prediction, clinical oversight is crucial to prevent applying machine learning systems to new data that differs from the training data (34), thus minimizing discrepancies between prediction results and actual clinical assessments (35).

The incidence rate of HAIs in ICUs is generally high, which may lead to a prolonged hospital stay and increased disease burden for patients. In this study, the incidence of infection in the ICU was 18.1%. Other studies have reported an incidence of HAIs in ICUs ranging from 8.0 to 60.0% (2, 36). The incidence of HAIs related to invasive procedures ranges from 25.0 to 50.0%, with patients in the ICU having a risk of HAIs 5 to 10 times higher than that in patients in general wards (37). Urinary tract infections related to catheterization in the ICU can increase the mortality rate by approximately 10.0% and prolong the length of hospital stay by an average of 10 days (38). Among the pathogens detected in this study, gram-negative bacteria were predominant, accounting for 78.8% of the cases. This finding is slightly higher than the 75.3% reported by Wang et al. (2) and substantially higher than the 67.4% reported by Li et al. (3) and the 57.4% reported by Cabrera-Tejada et al. (39). HAIs present a significant challenge in the management of ICU patients, often resulting in serious complications. HAIs can significantly affect patient outcomes, leading to increased complexity of treatment, prolonged hospital stays, an increased risk of bacterial resistance, increased mortality rates, an increased risk of disability, and higher healthcare costs (4, 6). In a study conducted by Cabrera-Tejada et al. (4), of patients admitted to the ICU and undergoing invasive mechanical ventilation for over 48 h, the mean length of hospital stay increased by 13.6 days, with an additional treatment cost of 20,965 Euros in patients with HAIs. These findings highlight the substantial burden HAIs pose on healthcare systems and patient outcomes, and the importance of effective prevention and management strategies in ICU setting (4).

Refining unbalanced datasets can enhance the accuracy of classification models used in scientific research. Cleaning unbalanced datasets can enhance the accuracy of classification models (40). The study revealed that compared with random under-sampling and oversampling techniques, predictions using the clustered under-sampling method led to more precise predictions of mortality rates (40). There are three common methods for addressing imbalanced ML data: data-level, algorithm-level, and hybrid approaches (41). In data-level methods, researchers modify the training dataset to suit the classifier algorithms, such as oversampling or under-sampling. Algorithm-level methods involve adjusting existing learners to reduce the bias toward the majority of the population, such as using cost-sensitive methods (41). This study aimed to predict HAIs by applying various balancing methods to the baseline data of ICU patients collected within 24 h of admission. In medical datasets, the records in the minority class are often more crucial than those in the control class. Therefore, addressing imbalanced data is essential to enhance the HAI identification rate. In addition to using SMOTE for oversampling, this study incorporated the LASSO operator for variable selection to reduce model complexity and enhance generalization capabilities.

The optimal model for predicting HAIs risk varies depending on the research context. Zhang et al. (18) found that the Naive Bayes model performed best in predicting surgical site infections after spinal surgery, with a mean AUC of 0.95, a sensitivity of 0.78, a specificity of 0.88, and an accuracy of 0.87. Cho et al. (42) have developed an ML model to monitor site infections during colon surgery. Their study revealed that neural networks using recursive variable elimination with 29 variables showed the best performance, achieving an AUC of 0.963. Previous studies have identified age, hemoglobin level, WBC count, and neutrophil count as predictors of catheter-related bloodstream infections using the XGBoost model (6), consistent with the findings of this study.

Owing to the increased risk of aspiration in ICU patients, inhalation of oropharyngeal secretions or vomitus can easily contaminate respiratory tract hygiene, leading to an increased risk of lower respiratory tract infections (43). Therefore, in patients with high PAS scores, the head of the bed should be elevated by 30° to 45° to reduce the risk of aspiration (43). In this study, the Braden score was identified as another predictor of HAI occurrence. Ding et al. (44) found that a low Braden score was an independent risk factor for stroke-associated pneumonia (SAP) after spontaneous intracerebral hemorrhage, showing moderate effectiveness in predicting SAP. In a study of chemotherapy-related bacterial infections, Jin et al. (45) found that an NRS 2002 score ≥ 3 was an independent risk factor; therefore, improving nutritional status could reduce the occurrence of bacterial infections. The findings of the studies by Ding et al. and Jin et al. were similar to the results of this study. It is crucial to monitor the scores of patients within 24 h of ICU admission and to intervene effectively if indicated.

This study has some limitations. The data were from a single center; therefore, the generalizability of the results may be limited. Multicenter studies should be conducted to enable a more thorough examination of variations according to region, demographics, and clinical practice. The research subjects for hospital infection monitoring using machine learning methods can be tens of thousands of people, which makes it more convenient for training and testing AI algorithms (46). Moreover, using the predictive model in real-world clinical settings, accompanied by real-time performance monitoring, is essential. Furthermore, strategies need to be devised for seamlessly integrating the model into existing medical information systems to facilitate easy access to and use of the predicted outcomes by clinicians. Further research could include additional variables associated with HAIs, such as demographic information (e.g., height and weight), vital sign (e.g., pulse, respiration, blood pressure, and temperature), and laboratory measurements (e.g., arterial blood gases) (47). The only pre-existing medical condition considered in this study was a history of diabetes. Future research should explore additional pre-existing medical conditions as potential predictors of HAIs.

5 Conclusion

This study presents the development and analysis of an ML algorithm aimed at predicting the risk of HAIs using data on patients admitted to an ICU collected within 24 h of admission, as predictors. Through a meticulous examination of the ranking of predictors derived from the ML model, this study identified key risk factors associated with HAIs, facilitating the identification of at-risk patients and the formulation of personalized treatment strategies. Future studies should include additional potential predictor variables, multicenter data, and a larger sample size to enhance the accuracy of prediction outcomes.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Ethics Committee of Qingdao Municipal Hospital (approval no: 2024-KY-020). The studies were conducted in accordance with the local legislation and institutional requirements. The ethics committee/institutional review board waived the requirement of written informed consent for participation from the participants or the participants’ legal guardians/next of kin because this project does not affect clinical treatment of patients, poses no greater than minimal risk, and all protected health-sensitive information has been removed from the limited dataset used in this study.

Author contributions

JW: Investigation, Project administration, Writing – original draft, Writing – review & editing. GW: Conceptualization, Formal analysis, Methodology, Software, Supervision, Visualization, Writing – original draft, Writing – review & editing. YujW: Resources, Supervision, Writing – review & editing. YunW: Resources, Supervision, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Acknowledgments

We would like to thank Editage (www.editage.cn) for English language editing.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2024.1444176/full#supplementary-material

References

1. Migliara, G, Di Paolo, C, Barbato, D, Baccolini, V, Salerno, C, Nardi, A, et al. Multimodal surveillance of healthcare associated infections in an intensive care unit of a large teaching hospital. Ann Ig. (2019) 31:399–413. doi: 10.7416/ai.2019.2302

PubMed Abstract | Crossref Full Text | Google Scholar

2. Wang, Y, Ren, J, Yao, Z, Wang, W, Wang, S, Duan, J, et al. Clinical impact and risk factors of intensive care unit-acquired nosocomial infection: a propensity score-matching study from 2018 to 2020 in a teaching Hospital in China. Infect Drug Resist. (2023) 16:569–79. doi: 10.2147/IDR.S394269

PubMed Abstract | Crossref Full Text | Google Scholar

3. Li, RJ, Wu, YL, Huang, K, Hu, XQ, Zhang, JJ, Yang, LQ, et al. A prospective surveillance study of healthcare-associated infections in an intensive care unit from a tertiary care teaching hospital from 2012-2019. Medicine. (2023) 102:e34469. doi: 10.1097/MD.0000000000034469

Development and evaluation of a model for predicting the risk of healthcare-associated infections in patients admitted to intensive care units

1 Introduction

2 Methods

2.1 Study participants

2.2 Data collection

2.3 Ethical approval

2.4 Identification of pathogens on culture

2.5 Statistical analysis

2.6 Model building and evaluation

3 Results

3.1 Patient characteristics

3.2 Characteristics of healthcare-associated infections

3.2.1 Infection sites

3.2.2 Bacterial pathogens identified

3.3 Synthetic minority oversampling technique

3.4 Variable selection and dimensionality reduction

4 Discussion

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

Supplementary material

References

94% of researchers rate our articles as excellent or good