Skip to main content

ORIGINAL RESEARCH article

Front. Public Health, 28 July 2023
Sec. Infectious Diseases: Epidemiology and Prevention
This article is part of the Research Topic Infectious Diseases and Hematology: Diagnosis and Management View all 13 articles

Prediction of the risk of cytopenia in hospitalized HIV/AIDS patients using machine learning methods based on electronic medical records

Liling Huang&#x;Liling Huang1Bo Xie&#x;Bo Xie2Kai ZhangKai Zhang1Yuanlong XuYuanlong Xu1Lingsong SuLingsong Su1Yu LvYu Lv1Yangjie LuYangjie Lu1Jianqiu QinJianqiu Qin3Xianwu PangXianwu Pang4Hong QiuHong Qiu5Lanxiang LiLanxiang Li6Xihua WeiXihua Wei4Kui HuangKui Huang1Zhihao MengZhihao Meng1Yanling Hu,,
Yanling Hu2,4,5*Jiannan Lv,
Jiannan Lv1,7*
  • 1Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, Liuzhou, Guangxi, China
  • 2School of Information and Management, Guangxi Medical University, Nanning, Guangxi, China
  • 3Nanning Center for Disease Control and Prevention, Nanning, Guangxi, China
  • 4Center for Genomic and Personalized Medicine, Guangxi key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China
  • 5Institute of Life Sciences, Guangxi Medical University, Nanning, Guangxi, China
  • 6Basic Medical College of Guangxi Medical University, Nanning, Guangxi, China
  • 7Department of Infection, Affiliated Hospital of the Youjiang Medical University for Nationalities, Baise, Guangxi, China

Background: Cytopenia is a frequent complication among HIV-infected patients who require hospitalization. It can have a negative impact on the treatment outcomes for these patients. However, by leveraging machine learning techniques and electronic medical records, a predictive model can be developed to evaluate the risk of cytopenia during hospitalization in HIV patients. Such a model is crucial for designing a more individualized and evidence-based treatment strategy for HIV patients.

Method: The present study was conducted on HIV patients who were admitted to Guangxi Chest Hospital between June 2016 and October 2021. We extracted a total of 66 clinical features from the electronic medical records and employed them to train five machine learning prediction models (artificial neural network [ANN], adaptive boosting [AdaBoost], k-nearest neighbour [KNN] and support vector machine [SVM], decision tree [DT]). The models were tested using 20% of the data. The performance of the models was evaluated using indicators such as the area under the receiver operating characteristic curve (AUC). The best predictive models were interpreted using the shapley additive explanation (SHAP).

Result: The ANN models have better predictive power. According to the SHAP interpretation of the ANN model, hypoproteinemia and cancer were the most important predictive features of cytopenia in HIV hospitalized patients. Meanwhile, the lower hemoglobin-to-RDW ratio (HGB/RDW), low-density lipoprotein cholesterol (LDL-C) levels, CD4+ T cell counts, and creatinine clearance (Ccr) levels increase the risk of cytopenia in HIV hospitalized patients.

Conclusion: The present study constructed a risk prediction model for cytopenia in HIV patients during hospitalization with machine learning and electronic medical record information. The prediction model is important for the rational management of HIV hospitalized patients and the personalized treatment plan setting.

1. Background

The human immunodeficiency virus (HIV) not only cause damage to the function of the immune system, but also have a negative impact on the body’s hematopoietic system (1). Cytopenia is one of the common complications of HIV infection (2) and the common types are anemia, thrombocytopenia and leucopenia. Within the HIV patients, anemia is an independently influential factor in both the acceleration of disease progression and the decline in quality of life (3). The prevalence of anemia ranges from 1.3 to 95% (4). Currently there are relatively few reports on the prevalence of leukopenia and thrombocytopenia and their associated factors. The most common type of leukopenia is neutropenia. Neutropenia affects 5 to 30% of patients in the early stages of HIV infection. Whereas in patients with late-stage HIV infection, the prevalence of neutropenia can reach 57 to 76% (57). The prevalence of thrombocytopenia among HIV patients ranges from 4.1 to 40% (8). Cytopenia may negatively affect outcomes of treatment and accelerate disease progression in patients with HIV (9). The causes of cytopenia in HIV patients are complicated. Currently reported factors that have been correlated with the occurrence of cytopenias in HIV patients include the direct effects of HIV infection, the effects of drug therapy and OIs (1012). And CD4+ T cell counts as a marker of acquired immunodeficiency syndrome (AIDS) progression have also been proven to correlate with cytopenia (13).

Machine learning has had a wide range of applications in medicine in recent years, such as cancer diagnosis (14), medical imaging (15) and death prediction (16). Numerous machine learning algorithms have demonstrated their potential for application to large-scale biomedical and patient datasets. Machine learning can balance the deviation and variance of data. Machine learning can be utilized on datasets containing numerous multidimensional variables to identify high-dimensional, non-linear relationships between clinical features for the purpose of data-driven outcome prediction. This approach overcomes certain limitations of current risk prediction analysis methods. Machine learning models for medical big data based on electronic medical records will support doctors in clinical diagnosis and management.

Cytopenia continues to be a significant concern in numerous countries with limited resources. The severity of cytopenia and its associated factors can impact the effectiveness of highly active antiretroviral therapy (HAART). However, this issue has not received enough attention in many developing countries. Most reports on the prevalence and associated factors of cytopenias come from regions with high AIDS prevalence and developed countries. These data may be quite different from other regions in terms of patient characteristics, cytopenic status, and HAART, etc. (1720). The main aim of the present study was to construct a predictive model that accurately predicts whether cytopenia would occur during hospitalization in people with HIV. To develop more appropriate treatment plans for HIV patients, it is essential to understand the profile of cytopenias and the relevant factors (21). However, there have been few reports on cytopenias among HIV patients in China. Thus, gaining insight into the risk factors that contribute to cytopenia in patients with HIV and developing an accurate predictive model for cytopenia could facilitate early intervention and prevent its progression in this patient population. For clinicians, the model could be used to screen HIV patients who may experience cytopenia in the future, and thus take a more appropriate treatment approach.

2. Materials and methods

2.1. Data collection

This study was carried out at Guangxi Chest Hospital. Guangxi Chest Hospital is located in Liuzhou, Guangxi. The hospital is the regional designated hospital for the treatment of serious infectious diseases. The study was carried out between June 2016 and October 2021 and enrolled a total of 6,220 hospitalized HIV infected patients. Through the hospital electronic medical record system identifying HIV patients with cytopenia. People with HIV who did not suffer from cytopenia on their admission were included as study participants. The diagnostic criteria for anemia is the same as that of the World Health Organization (WHO). A hemoglobin level < 110 g/L (women) or < 120 g/L (men) is defined as anemia. Anemia is graded as severe (hemoglobin <60 g/L), moderate (hemoglobin 60–89 g/L) and mild (hemoglobin 90–119 g/L for men or 90–109 g/L for women). Compared to anemia, leukopenia and thrombocytopenia do not have universally accepted cut-off values. We defined them using criteria that have been used in other studies (7, 22). The criterion for leukopenia was total leukocytes <4.0 × 103/uL. Platelet counts <150 × 103/uL were considered to be thrombocytopenia. The classification criteria for mild thrombocytopenia, moderate thrombocytopenia and severe thrombocytopenia were 100–150 × 103/uL, 50–100 × 103/uL and less than 50 × 103/uL, respectively, Gunda et al. (9). If a patient has multiple admissions to hospital, the data of the most recent admission will be included as a priority. The results of laboratory tests on the patient’s blood first collected on admission to hospital were included in the study. The patient was discharged from hospital or died during hospitalization then observation was stopped. Patients younger than 18 years old, patients who received radiation therapy within the past 45 days, and pregnant women were excluded from the study. Because the underlying conditions of these patients themselves may induce or exacerbate cytopenias. All of the patients were confirmed to be HIV-positive by enzyme-linked immunosorbent assay and immunoblot detection laboratory tests, and the diagnosis was consistent with national HIV diagnostic criteria.

2.2. Data preprocessing

We extracted sociodemographic and clinical information, as well as blood examination records, from the medical electronic record system of Guangxi Chest Hospital to construct a structured dataset for the study participants. The structured dataset included 66 variables: 13 clinical comorbidity/co-infection variables (tuberculosis, pneumocystis, candida infection, cryptococcus, herpesvirus, cytomegalovirus, pneumonia, electrolyte disorders, hepatitis (B or C), hypoproteinemia, diabetes, hypertension, and cancer) 6 demographic indicators (gender, age, ethnicity, marital status, actual days in hospital and residence) 47 laboratory indicators (CD8+ T cell count, CD4+ T cell count and levels of ALP, ALT, AST, CEA, etc.).Excluding variables with missing data greater than 15%. Used Random Forest to fill in the missing values for the structured dataset (23). All the above data processing steps were done by the numpy, pandas and sklearn packages of Python 3.8.6.

2.3. Model construction and evaluation

Whether cytopenia had occurred in HIV patients at hospital discharge was used as a outcome of the prediction model. The data was divided randomly into two datasets using Scikit-learn, a Python package (24). 80% of the data was utilized for training the machine learning models and adjusting their parameters. 20% of the data were used to test the models and fine-tune the hyperparameters. We used five machine learning classifiers (artificial neural network (ANN), adaptive boosting (AdaBoost), k-nearest neighbour (KNN), support vector machine (SVM) and decision tree (DT)) to create five models for predicting outcomes. The five machine learning classification prediction models were all constructed based on the sklearn package from Python 3.8.6.

The predictive ability of the prediction models was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), specificity, accuracy, sensitivity, and F1 scores. The evaluation indicators were varied from 0 to 1, corresponding to the worst and best scores, respectively. Using these metrics together allowed a more comprehensive evaluation and comparison of the classification effectiveness of different machine learning methods. The prediction model with the most effective performance evaluation indicators was selected as the final model. To explain the outcome of the best-performing predictive model, we utilized the Shapley additive interpretation (SHAP) to calculate the contribution of each feature to the predicted outcome (25).

2.4. Statistical analysis

The analysis of data was conducted with Python version 3.8.6 and SPSS version 24 statistical software package (SPSS Inc., Chicago, IL). The descriptive statistics such as percentage, mean, median, IQR, and standard deviation were used as appropriate. Student’s t-test was used to compare normally distributed continuous variables, while the Mann–Whitney U test was used for non-normally distributed continuous variables. The chi-square test was utilized to compare categorical variables. The tests were two-sided, and statistical significance was defined as p values less than 0.05.

2.5. Ethical statement

The Human Research Ethics Committee of Guangxi Medical University (ethical approval number: 20210172) and the Medical Ethics Committee of Chest Hospital (ethical approval number: 2022–011) approved this study. Informed consent was waived after review by the Chest Hospital Medical Ethics Committee. Patient information was de-identified, and confidentiality was maintained throughout the study.

3. Result

3.1. Sociodemographic characteristics of study participants

In this research, the prevalence of cytopenia in hospitalized HIV patients was 19.3% (1,201/6220). The study included 2,187 qualified people living with HIV. Figure 1 showed the selection process for the patients included in the present study. The study participants had a median age of 56 years (interquartile range (IQR): 45–66 years). The median number of days in hospital for study participants was 21 days (interquartile range (IQR): 12–33 days). Among the 2,187 study participants, 1,686 (77.1%) were male, 1,673 (76.5%) were from rural areas and 1,296 (59.3%) were married. Over half (55.0%) of the sample were from ethnic minority groups, with the Zhuang ethnic group comprising the majority (48.0%). The cytopenia and non-cytopenia groups differed significantly in demographic characteristics, including gender, ethnicity, marital status, and residence address (p < 0.05). The essential features of the study participants were listed in Table 1.

FIGURE 1
www.frontiersin.org

Figure 1. Flow diagram of the selection of participants included in the present study.

TABLE 1
www.frontiersin.org

Table 1. Sociodemographic characteristics of HIV patients included in study.

3.2. Clinical and laboratory characteristics of study participants

The most common clinical complication/co-infection in hospitalized HIV patients was Candida infection (56.8%, 1243/2187), followed by hypoproteinaemia (48.0%, 1049/2187), pneumonia (47.1%, 1030/2187), and tuberculosis (47.1%, 1030/2187). The prevalence of cytopenia was as high as 76.0% (190/250) in HIV patients with electrolyte disturbances, which was higher than 52.2% (1,011/1937) in HIV patients without electrolyte disturbances. The prevalence of cytopenia was 72.4% (760/1049) in HIV patients with hypoproteinaemia, which was higher than 38.8% (441/1138) in HIV patients without hypoproteinaemia. Table 2 showed detailed information about clinical complications/co-infections of the study participants.

TABLE 2
www.frontiersin.org

Table 2. Characteristics of clinical complications/co-infections in HIV-positive patients included in study.

We evaluated the median levels of important indicators in both cytopenic and non-cytopenic groups of patients with HIV. The hemocytopenic group had lower levels of CD4+ T cell count, CD45+ T cell count, CD3+ T cell count, cholinesterase (CHE), creatinine clearance (Ccr), prealbumin (PA) and total cholesterol (CHOL). There were also some laboratory indicators of interest that were significantly different, such as serum cystatin (Cys-C), triglycerides (TG) and chlorine (Cl). Detailed characteristics of the laboratory indicators were shown in Table 3.

TABLE 3
www.frontiersin.org

Table 3. Characteristics of the laboratory measures of the HIV patients included in the study.

3.3. Feature selection, model construction and evaluation

Using the sklearn package and the pandas package in Python 3.8.6 to achieve feature filtering of the data. We used recursive feature elimination (RFE) with random forest to select input features for a predictive model aimed at predicting the occurrence of cytopenia in HIV patients during hospitalization. Finally, 12 variables were selected from 66 variables as predictors of the risk of cytopenia in HIV patients. Among the 12 included indicators, 9 were laboratory examination indicators, including CD4+ T cell count, serum cystatin (Cys-C), standard bicarbonate (HCO3std), low-density lipoprotein cholesterol (LDL-C), creatinine clearance (Ccr), chloride (Cl), glutamyltransferase (GGT), monocytes-to-lymphocites ratio (Mono/Lymph) and hemoglobin-to-RDW ratio (HGB/RDW), 3 clinical comorbidity/co-infection including electrolyte disturbances, hypoproteinemia and cancer.

The prediction models for the development of cytopenia in HIV patients during the hospitalization were constructed based on 12 features from the feature selection results. Table 4 displayed the prediction performance of the prediction models generated by the five machine learning algorithms. The ANN model demonstrated the highest sensitivity and specificity and therefore exhibited superior predictive power compared to other models. Figure 2 showed the ROC curves for the five models, with the ANN model displaying the most favorable results.

TABLE 4
www.frontiersin.org

Table 4. Performance of predictive models built by five machine learning algorithms.

FIGURE 2
www.frontiersin.org

Figure 2. Evaluation of five machine learning algorithms based on the AUC of ROC curves.

3.4. Explanation of risk factor

To better comprehend how the features integrated into the ANN prediction model contribute to the prediction results, we computed the SHAP values for each individual feature. The ANN prediction model generates a predictive value for each predicted sample. The SHAP value is a numerical score assigned to each feature in a given sample, indicating the degree of impact each feature has on the outcome and whether it is a positive or negative influence. The importance matrix diagram for the ANN model was shown in Figure 3. The importance matrix ranks the features that affect cytopenias in hospitalized HIV patients, from most to least important. The importance matrix ranking results for the ANN prediction model were hypoproteinemia, HGB/RDW, cancer, LDL-C, CD4+ T cell count, electrolyte disturbance, Cl, Ccr, HCO3std, Mono/Lymph, GGT and Cys-C. The SHAP summary plot showed how each variable had an impact on the predicted outcome of the occurrence of cytopenia in hospitalized HIV patients (Figure 4). Each patient was assigned a point, and features were color-coded based on attribute values, with red indicating higher values and blue indicating lower values. According to the SHAP summary plots, hypoproteinemia and cancer were identified as the most significant features. In hospitalized patients with HIV, these two features were strongly and positively correlated with cytopenia. HIV patients presenting with these two clinical comorbidities were at significantly increased risk of developing cytopenia during hospitalization compared to HIV patients not presenting with these two comorbidities. HGB/RDW, LDL-C, CD4+ T cell count, Ccr, HCO3std and Mono/Lymph also had a significant effect on the occurrence of cytopenia in hospitalized patients with HIV. The risk of cytopenia increases as the value of these features decreases. The higher the value of Cl, GGT and Cys-C, the greater the risk of cytopenia. Hospitalized HIV patients with electrolyte disorders were more likely to develop cytopenia.

FIGURE 3
www.frontiersin.org

Figure 3. Importance matrix plot of the ANN model showing the contribution of each clinical feature to the predictive model of cytopenia in hospitalized HIV patients.

FIGURE 4
www.frontiersin.org

Figure 4. SHAP summary plot of the top 12 clinical features of the ANN model.

4. Discussion

This study conducted a retrospective analysis on a large sample size and identified Candida infection, hypoproteinemia, tuberculosis and pneumonia as the most frequent complications among hospitalized patients with HIV. This finding was similar to previous reports (26). The current study employed machine learning techniques and clinical features readily obtainable from electronic medical records to develop a predictive model for cytopenia risk in HIV patients during hospitalization. We evaluated and compared the predictive capabilities of five distinct machine learning models. The results showed that ANN models have the highest sensitivity and specificity. ANN model is widely used in clinical detection and pathology identification due to the good performance it has shown in recognition (27, 28). In comparison to other machine learning models, ANN are able to effectively process non-linear relationships, which is important in many real-world problems (29). ANN consist of multiple neurons and layers that enable them to learn and represent very complex relationships and have better capabilities for implicit pattern and feature extraction in data. The hidden layer structure of ANN enables them to capture and represent complex relationships between input features, thus better adapting to different types of data (30). As far as the authors know, the current research is the first published study to use ANN models to predict the occurrence of cytopenia in HIV patients during hospitalization.

The combination of electronic medical records and machine learning has contributed to the development of complex prediction models (31, 32). To enhance the transparency of the model’s prediction process, we employed the SHAP method to compute the contribution of individual variables to the model’s predicted outcome. The results showed that hypoproteinemia and cancer were important factors influencing the occurrence of cytopenia during hospitalization of HIV patients. We also identified HGB/RDW, LDL-C, CD4+ T cell count and Ccr were the variables that had a greater impact.

Previous studies have demonstrated that hypoproteinemia is a potential predictor of disease progression and mortality among individuals with HIV (33). A cohort study in West Africa that investigated the nutritional status of HIV patients who received HAART for 1 year reported that low albumin was associated with anemia (34). It is not coincidental that serum albumin levels have been claimed to be independently associated with severe anemia and could influence mortality and the outcome of HAART in HIV patients (35). There is growing evidence that hypoproteinemia has a dramatic impact on cytopenia in HIV patients, particularly anemia. There are two possible reasons why people with cancer are more likely to develop anemia; cancer causes difficulty in the production of red blood cells and shortens the survival time of red blood cells (36). Furthermore, anti-cancer treatments may harm healthy blood cells. Our study discovered that hypoproteinemia and cancer were significant factors contributing to cytopenia in HIV patients during hospitalization.

After analyzing all features included in the model, we found that HGB/RDW, LDL-C, CD4+ T cell count, and Ccr had the great impact on predicting the risk of cytopenia during hospitalization in HIV patients. Specifically, lower levels of HGB/RDW, LDL-C, CD4+ T cell count, and Ccr were associated with an increased risk of cytopenia. HGB/RDW as a new comprehensive biomarker has gradually attracted widespread attention. The lower HGB/RDW levels have been demonstrated to be associated with cancer development and poor prognosis (37, 38). In the present case, HIV patients with lower levels of HGB/RDW had a higher risk of cytopenia during hospitalization. The lower HGB/RDW may represent abnormal erythrocyte homeostasis and deformed erythrocytes, leading to disturbed blood flow in the microcirculation (39), which may have contributed to the increased susceptibility of people living with HIV to cytopenias during hospitalization. Hemoglobin and RDW are easily accessible laboratory examination indicators. But HGB/RDW is rarely focused on during HIV treatment. The results of the present study showed that HGB/RDW is strongly associated with the development of cytopenias in people living with HIV and deserves greater attention.

Low LDL-C is often associated with long-term vegetarian diet (40), liver disease (41) and drug therapy (42). Low LDL-C has also been reported to be associated with chronic anemia (43). However, LDL-C has not been of particular concern in previous studies about risk factors associated with cytopenia in HIV patients. Although the mechanism of how lower LDL-C leads to cytopenia is not clear, there are some possible explanations. Possible explanations include erythrocyte fragility due to low cholesterol levels in the erythrocyte membrane (44), as well as LDL-related platelet activation and tissue factor expression (45). But the mechanism of how low LDL-C leads to cytopenia needs more further research to prove it.

As with previous studies, our research found that low CD4+ T cell counts are a risk factor for cytopenia in HIV patients. CD4+ T cell counts are closely correlated with HIV disease progression, and lower counts are typically indicative of advanced disease progression (46). The primary explanation for cytopenia, which results from low CD4+ T cell counts in HIV patients, is likely HIV-mediated hematopoietic suppression and direct T cell infection (10). Moreover, research showed that improved CD4+ T cell counts after HAART treatment have led to a reduction in the prevalence of cytopenia in HIV patients (4749), indicating that HIV-related cytopenia is caused by HIV infection and immunosuppression (50).

CCr is a sensitive marker of glomerular damage and an early indicator of kidney impairment. Lower CCr could lead to chronic kidney disease, and the common complications of chronic kidney disease include anemia (51). Concurrently, it has also been claimed that high serum creatinine is a significant predictor of anemia in HIV patients (52). GGT is an important indicator of liver function and an increased GGT level means impaired liver function. And abnormal liver function could cause a cytopenia (53). Both CCr and GGT reflect the organ function of HIV patients and the potential risk of cytopenia in HIV patients, but have not been focused on in previous studies. The levels of Cl, HCO3std and electrolytes provide valuable information about the body’s metabolism, and their disturbances may indicate metabolic issues in HIV patients who were at a high risk of developing cytopenia. Mono/Lymph is demonstrated to be a predictor of the risk of developing tuberculosis in people living with HIV (54). And tuberculosis is one of the factors associated with the development of cytopenia in people living with HIV (9). Although Cys-C levels may be considered clinically insignificant and often overlooked, it is still important for predictive modeling purposes.

The present study used real-world data from electronic medical records to construct an ANN prediction model for predicting the risk of cytopenia in HIV patients during hospitalization using multiple clinical complications and clinical variables. We identified some risk factors associated with cytopenia in HIV patients that have not been focused on in previous studies. Finally, the predictive model can serve as a clinical screening tool to assess the risk of cytopenia in HIV patients during hospitalization, thus facilitating the development of more personalized and rational treatment plans. However, there were certain limitations in our study. Firstly, our study sample was predominantly limited to southern China and thus not indicative of the overall situation of individuals living with HIV throughout China. Secondly, the potential influence of medications and treatment regimens on the study outcomes was not taken into account during the data collection process. Thirdly, the prediction model in this study was not validated for stability using external data. Our model has been internally validated and demonstrates consistent and robust predictive ability for the results explored.

5. Conclusion

To sum up, this study utilized electronic medical records to gather demographic information, clinical complications, and laboratory test indicators of HIV patients. These clinical characteristics were then used to construct a predictive model to assess the risk of cytopenia in HIV patients. The predictive model has significant implications for improving the management of HIV patients and tailoring personalized treatment plans.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The Human Research Ethics Committee of Guangxi Medical University (ethical approval number: 20210172) and the Medical Ethics Committee of Chest Hospital (ethical approval number: 2022-011) approved this study. Informed consent was waived after review by the Chest Hospital Medical Ethics Committee. Patient information was de-identified, and confidentiality was maintained throughout the study. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions

YH, JL, LH, JQ, and KZ designed the study and provided the correlative knowledge. YX, LS, YL, YJL, ZM, and KH collected and provided the data. LL, XW, and BX extracted data and cleaned data. BX and LL constructed the prediction model. KZ, LS, YL, and BX generated the figures and tables. YX, YH, HQ, XP, and BX wrote and edited the manuscript. All authors contributed to the article and approved the submitted version.

Funding

This research was supported by several organizations, including the Guangxi Key Research and Development Program (2021AB12032), Research Projects for High-level Talents in Affiliated Hospital of Youjiang Medical College of Ethnic Minorities in 2022 (R202210308), Guangxi Medical and Health Appropriate Technology Development and Promotion Application Project (S2022042), Nanning Scientific Research and Technology Development Program (20206124), Guangxi Chinese Medicine Appropriate Technology Development and Promotion Project (GZSY22-71), and Major National Science and Technology Projects (2017ZX10202101-001-006). It is important to note that the funding bodies were not involved in the design of the study, data collection, analysis, interpretation or manuscript writing.

Acknowledgments

The authors would like to thank all participants of this study, financial supporters and Guangxi Chest Hospital for their support.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Durandt, C, Potgieter, JC, Mellet, J, Herd, C, Khoosal, R, Nel, JG, et al. HIV and haematopoiesis. S Afr Med J. (2019) 109:40–5. doi: 10.7196/SAMJ.2019.v109i8b.13829

CrossRef Full Text | Google Scholar

2. Fiseha, T, and Ebrahim, H. Prevalence and predictors of cytopenias in HIV-infected adults at initiation of antiretroviral therapy in Mehal Meda hospital, Central Ethiopia. J Blood Med. (2022) 13:201–11. doi: 10.2147/JBM.S355966

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Subbaraman, R, Devaleenal, B, Selvamuthu, P, Yepthomi, T, Solomon, SS, Mayer, KH, et al. Factors associated with anaemia in HIV-infected individuals in southern India. Int J STD AIDS. (2009) 20:489–92. doi: 10.1258/ijsa.2008.008370

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Belperio, PS, and Rhew, DC. Prevalence and outcomes of anemia in individuals with human immunodeficiency virus: a systematic review of the literature. Am J Med. (2004) 116:27–43. doi: 10.1016/j.amjmed.2003.12.010

CrossRef Full Text | Google Scholar

5. Calenda, V, and Chermann, JC. The effects of HIV on hematopoiesis. Eur J Haematol. (1992) 48:181–6. doi: 10.1111/j.1600-0609.1992.tb01582.x

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Frontiera, M, and Myers, AM. Peripheral blood and bone marrow abnormalities in the acquired immunodeficiency syndrome. West J Med. (1987) 147:157–60.

PubMed Abstract | Google Scholar

7. Fekene, TE, Juhar, LH, Mengesha, CH, and Worku, DK. Prevalence of cytopenias in both HAART and HAART naïve HIV infected adult patients in Ethiopia: a cross sectional study. BMC Hematol. (2018) 18:8. doi: 10.1186/s12878-018-0102-7

CrossRef Full Text | Google Scholar

8. Evans, RH, and Scadden, DT. Haematological aspects of HIV infection. Baillieres Best Pract Res Clin Haematol. (2000) 13:215–30. doi: 10.1053/beha.1999.0069

CrossRef Full Text | Google Scholar

9. Gunda, DW, Godfrey, KG, Kilonzo, SB, and Mpondo, BC. Cytopenias among ART-naive patients with advanced HIV disease on enrolment to care and treatment services at a tertiary hospital in Tanzania: a cross-sectional study. Malawi Med J. (2017) 29:43–52. doi: 10.4314/mmj.v29i1.9

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Koka, PS, and Reddy, ST. Cytopenias in HIV infection: mechanisms and alleviation of hematopoietic inhibition. Curr HIV Res. (2004) 2:275–82. doi: 10.2174/1570162043351282

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Opie, J . Haematological complications of HIV infection. S Afr Med J. (2012) 102:465–8. doi: 10.7196/SAMJ.5595

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Passos, AM, Treitinger, A, and Spada, C. An overview of the mechanisms of HIV-related thrombocytopenia. Acta Haematol. (2010) 124:13–8. doi: 10.1159/000313782

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Vishnu, P, and Aboulafia, DM. Haematological manifestations of human immune deficiency virus infection. Br J Haematol. (2015) 171:695–709. doi: 10.1111/bjh.13783

CrossRef Full Text | Google Scholar

14. Chen, Z, Wang, M, de Wilde, RL, Feng, R, Su, M, Torres-de la Roche, LA, et al. A machine learning model to predict the triple negative breast Cancer immune subtype. Front Immunol. (2021) 12:749459. doi: 10.3389/fimmu.2021.749459

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Ashour, AS, Hawas, AR, and Guo, Y. Comparative study of multiclass classification methods on light microscopic images for hepatic schistosomiasis fibrosis diagnosis. Health Inf Sci Syst. (2018) 6:7. doi: 10.1007/s13755-018-0047-z

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Hu, C, Liu, Z, Jiang, Y, Shi, O, Zhang, X, Xu, K, et al. Early prediction of mortality risk among patients with severe COVID-19, using machine learning. Int J Epidemiol. (2021) 49:1918–29. doi: 10.1093/ije/dyaa171

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Mocroft, A, Kirk, O, Barton, SE, Dietrich, M, Proenca, R, Colebunders, R, et al. Anaemia is an independent predictive marker for clinical prognosis in HIV-infected patients from across Europe. EuroSIDA study group. AIDS. (1999) 13:943–50. doi: 10.1097/00002030-199905280-00010

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Sullivan, PS, Hanson, DL, Chu, SY, Jones, JL, and Ward, JW. Epidemiology of anemia in human immunodeficiency virus (HIV)-infected persons: results from the multistate adult and adolescent spectrum of HIV disease surveillance project. Blood. (1998) 91:301–8. doi: 10.1182/blood.V91.1.301

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Turner, BJ, Markson, L, and Taroni, F. Estimation of survival after AIDS diagnosis: CD4 T lymphocyte count versus clinical severity. J Clin Epidemiol. (1996) 49:59–65. doi: 10.1016/0895-4356(95)00067-4

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Choi, SY, Kim, I, Kim, NJ, Lee, SA, Choi, YA, Bae, JY, et al. Hematological manifestations of human immunodeficiency virus infection and the effect of highly active anti-retroviral therapy on cytopenia. Korean J Hematol. (2011) 46:253–7. doi: 10.5045/kjh.2011.46.4.253

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Nassiri, R . Avoiding antiretroviral-associated cytopenias. J Am Osteopath Assoc. (2006) 106:111–2.

PubMed Abstract | Google Scholar

22. Tamir, Z, Seid, A, and Haileslassie, H. Magnitude and associated factors of cytopenias among antiretroviral therapy naïve human immunodeficiency virus infected adults in Dessie, Northeast Ethiopia. PLoS One. (2019) 14:e0211708. doi: 10.1371/journal.pone.0211708

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Tang, F, and Ishwaran, H. Random Forest missing data algorithms. Stat Anal Data Min. (2017) 10:363–77. doi: 10.1002/sam.11348

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Swami, A, and Jain, R. Scikit-learn: machine learning in Python. J Mach Learn Res. (2013) 12:2825–30.

Google Scholar

25. Tseng, PY, Chen, YT, Wang, CH, Chiu, KM, Peng, YS, Hsu, SP, et al. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Crit Care. (2020) 24:478. doi: 10.1186/s13054-020-03179-9

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Pang, W, Shang, P, Li, Q, Xu, J, Bi, L, Zhong, J, et al. Prevalence of opportunistic infections and causes of death among hospitalized HIV-infected patients in Sichuan, China. Tohoku J Exp Med. (2018) 244:231–42. doi: 10.1620/tjem.244.231

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Lugtu, EJ, Ramos, DB, Agpalza, AJ, Cabral, EA, Carandang, RP, Dee, JE, et al. Artificial neural network in the discrimination of lung cancer based on infrared spectroscopy. PLoS One. (2022) 17:e0268329. doi: 10.1371/journal.pone.0268329

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Ganji, Z, Aghaee Hakak, M, and Zare, H. Comparison of machine learning methods for the detection of focal cortical dysplasia lesions: decision tree, support vector machine and artificial neural network. Neurol Res. (2022) 44:1142–9. doi: 10.1080/01616412.2022.2112381

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Renganathan, V . Overview of artificial neural network models in the biomedical domain. Bratisl Lek Listy. (2019) 120:536–40. doi: 10.4149/BLL_2019_087

CrossRef Full Text | Google Scholar

30. Tian, Y, Yang, J, Lan, M, and Zou, T. Construction and analysis of a joint diagnosis model of random forest and artificial neural network for heart failure. Aging (Albany NY). (2020) 12:26221–35. doi: 10.18632/aging.202405

CrossRef Full Text | Google Scholar

31. Damotte, V, Lizée, A, Tremblay, M, Agrawal, A, Khankhanian, P, Santaniello, A, et al. Harnessing electronic medical records to advance research on multiple sclerosis. Mult Scler. (2019) 25:408–18. doi: 10.1177/1352458517747407

CrossRef Full Text | Google Scholar

32. Cheung, M, Cobb, AN, and Kuo, PC. Predicting burn patient mortality with electronic medical records. Surgery. (2018) 164:839–47. doi: 10.1016/j.surg.2018.07.010

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Leal, JA, Fausto, MA, Carneiro, M, and Tubinambás, U. Prevalence of hypoalbuminemia in outpatients with HIV/AIDS. Rev Soc Bras Med Trop. (2018) 51:203–6. doi: 10.1590/0037-8682-0093-2017

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Sicotte, M, Bemeur, C, Diouf, A, Zunzunegui, MV, and Nguyen, VK, for the ATARAO initiative. Nutritional status of HIV-infected patients during the first year HAART in two west African cohorts. J Health Popul Nutr. (2015) 34:1. doi: 10.1186/s41043-015-0001-5

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Sudfeld, CR, Isanaka, S, Aboud, S, Mugusi, FM, Wang, M, Chalamilla, GE, et al. Association of serum albumin concentration with mortality, morbidity, CD4 T-cell reconstitution among tanzanians initiating antiretroviral therapy. J Infect Dis. (2013) 207:1370–8. doi: 10.1093/infdis/jit027

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Zucker, S . Anemia in cancer. Cancer Investig. (1985) 3:249–60. doi: 10.3109/07357908509039786

CrossRef Full Text | Google Scholar

37. Chi, G, Lee, JJ, Montazerin, SM, and Marszalek, J. Prognostic value of hemoglobin-to-red cell distribution width ratio in cancer: a systematic review and meta-analysis. Biomark Med. (2022) 16:473–82. doi: 10.2217/bmm-2021-0577

CrossRef Full Text | Google Scholar

38. Su, YC, Wen, SC, Li, CC, Su, HC, Ke, HL, Li, WM, et al. Low Hemoglobin-to-red cell distribution width ratio is associated with disease progression and poor prognosis in upper tract urothelial carcinoma. Biomedicine. (2021) 9:672. doi: 10.3390/biomedicines9060672

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Salvagno, GL, Sanchis-Gomar, F, Picanza, A, and Lippi, G. Red blood cell distribution width: a simple parameter with multiple clinical applications. Crit Rev Clin Lab Sci. (2015) 52:86–105. doi: 10.3109/10408363.2014.992064

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Djekic, D, Shi, L, Brolin, H, Carlsson, F, Särnqvist, C, Savolainen, O, et al. Effects of a vegetarian diet on cardiometabolic risk factors, gut microbiota, and plasma metabolome in subjects with ischemic heart disease: a randomized, crossover study. J Am Heart Assoc. (2020) 9:e016518. doi: 10.1161/JAHA.120.016518

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Jiang, ZG, Mukamal, K, Tapper, E, Robson, SC, and Tsugawa, Y. Low LDL-C and high HDL-C levels are associated with elevated serum transaminases amongst adults in the United States: a cross-sectional study. PLoS One. (2014) 9:e85366. doi: 10.1371/journal.pone.0085366

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Ballantyne, CM, Banach, M, Mancini, GBJ, Lepor, NE, Hanselman, JC, Zhao, X, et al. Efficacy and safety of bempedoic acid added to ezetimibe in statin-intolerant patients with hypercholesterolemia: a randomized, placebo-controlled study. Atherosclerosis. (2018) 277:195–203. doi: 10.1016/j.atherosclerosis.2018.06.002

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Shalev, H, Kapelushnik, J, Moser, A, Knobler, H, and Tamary, H. Hypocholesterolemia in chronic anemias with increased erythropoietic activity. Am J Hematol. (2007) 82:199–202. doi: 10.1002/ajh.20804

PubMed Abstract | CrossRef Full Text | Google Scholar

44. Yamori, Y, Nara, Y, Horie, R, and Ooshima, A. Abnormal membrane characteristics of erythrocytes in rat models and men with predisposition to stroke. Clin Exp Hypertens (1978). (1980) 2:1009–21. doi: 10.3109/10641968009037158

CrossRef Full Text | Google Scholar

45. Rosenson, RS, and Lowe, GD. Effects of lipids and lipoproteins on thrombosis and rheology. Atherosclerosis. (1998) 140:271–80. doi: 10.1016/S0021-9150(98)00144-0

PubMed Abstract | CrossRef Full Text | Google Scholar

46. Moir, S, Chun, TW, and Fauci, AS. Pathogenic mechanisms of HIV disease. Annu Rev Pathol. (2011) 6:223–48. doi: 10.1146/annurev-pathol-011110-130254

CrossRef Full Text | Google Scholar

47. Deressa, T, Damtie, D, Workineh, M, Genetu, M, and Melku, M. Anemia and thrombocytopenia in the cohort of HIV-infected adults in Northwest Ethiopia: a facility-based cross-sectional study. EJIFCC. (2018) 29:36–47.

PubMed Abstract | Google Scholar

48. Woldeamanuel, GG, and Wondimu, DH. Prevalence of thrombocytopenia before and after initiation of HAART among HIV infected patients at black lion specialized hospital, Addis Ababa, Ethiopia: a cross sectional study. BMC Hematol. (2018) 18:9. doi: 10.1186/s12878-018-0103-6

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Levine, AM, Karim, R, Mack, W, Gravink, DJ, Anastos, K, Young, M, et al. Neutropenia in human immunodeficiency virus infection: data from the women's interagency HIV study. Arch Intern Med. (2006) 166:405–10. doi: 10.1001/archinte.166.4.405

PubMed Abstract | CrossRef Full Text | Google Scholar

50. Fan, L, Li, C, and Zhao, H. Prevalence and risk factors of cytopenia in HIV-infected patients before and after the initiation of HAART. Biomed Res Int. (2020) 2020:1–10. doi: 10.1155/2020/3132589

CrossRef Full Text | Google Scholar

51. Jha, V, Garcia-Garcia, G, Iseki, K, Li, Z, Naicker, S, Plattner, B, et al. Chronic kidney disease: global dimension and perspectives. Lancet. (2013) 382:260–72. doi: 10.1016/S0140-6736(13)60687-X, Epub 2013 May 31. Erratum in: Lancet. 2013 Jul 20;382(9888):208

CrossRef Full Text | Google Scholar

52. Tigabu, A, Beyene, Y, Getaneh, T, Chekole, B, Gebremaryam, T, Sisay Chanie, E, et al. Incidence and predictors of anemia among adults on HIV care at South Gondar zone public general hospital Northwest Ethiopia, 2020; retrospective cohort study. PLoS One. (2022) 17:e0259944. doi: 10.1371/journal.pone.0259944

PubMed Abstract | CrossRef Full Text | Google Scholar

53. Marks, PW . Hematologic manifestations of liver disease. Semin Hematol. (2013) 50:216–21. doi: 10.1053/j.seminhematol.2013.06.003

CrossRef Full Text | Google Scholar

54. Naranbhai, V, Hill, AV, Abdool Karim, SS, Naidoo, K, Abdool Karim, Q, Warimwe, GM, et al. Ratio of monocytes to lymphocytes in peripheral blood identifies adults at risk of incident tuberculosis among HIV-infected adults initiating antiretroviral therapy. J Infect Dis. (2014) 209:500–9. doi: 10.1093/infdis/jit494

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: HIV, cytopenia, anemia, thrombocytopenia, leukopenia, electronic medical records, machine learning

Citation: Huang L, Xie B, Zhang K, Xu Y, Su L, Lv Y, Lu Y, Qin J, Pang X, Qiu H, Li L, Wei X, Huang K, Meng Z, Hu Y and Lv J (2023) Prediction of the risk of cytopenia in hospitalized HIV/AIDS patients using machine learning methods based on electronic medical records. Front. Public Health. 11:1184831. doi: 10.3389/fpubh.2023.1184831

Received: 12 March 2023; Accepted: 14 July 2023;
Published: 28 July 2023.

Edited by:

Pierpaolo Di Micco, Ospedale Santa Maria delle Grazie, Italy

Reviewed by:

Alessandro Perrella, Colli Hospital, Italy
Giuseppe Cardillo, Medylab Advanced Biochemistry, Italy

Copyright © 2023 Huang, Xie, Zhang, Xu, Su, Lv, Lu, Qin, Pang, Qiu, Li, Wei, Huang, Meng, Hu and Lv. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jiannan Lv, Z3hhaWRzY2NAMTYzLmNvbQ==; Yanling Hu, eWxodXBvc3RAMTYzLmNvbQ==

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.