Hematological indicator-based machine learning models for preoperative prediction of lymph node metastasis in cervical cancer

Zhao, Huan; Wang, Yuling; Sun, Yilin; Wang, Yongqiang; Shi, Bo; Liu, Jian; Zhang, Sai

doi:10.3389/fonc.2024.1400109

ORIGINAL RESEARCH article

Front. Oncol., 13 August 2024

Sec. Gynecological Oncology

Volume 14 - 2024 | https://doi.org/10.3389/fonc.2024.1400109

This article is part of the Research Topic Metastatic cancer: Cell behavior and the pre-metastatic niche View all 4 articles

Hematological indicator-based machine learning models for preoperative prediction of lymph node metastasis in cervical cancer

Huan Zhao^1†

Yuling Wang^2†

Yilin Sun²

Yongqiang Wang¹

Bo Shi¹

Jian Liu^2*

Sai Zhang^1*

¹School of Medical Imaging, Bengbu Medical University, Bengbu, Anhui, China
²Department of Gynecology and Oncology, First Affiliated Hospital, Bengbu Medical University, Bengbu, Anhui, China

Background: Lymph node metastasis (LNM) is an important prognostic factor for cervical cancer (CC) and determines the treatment strategy. Hematological indicators have been reported as being useful biomarkers for the prognosis of a variety of cancers. This study aimed to evaluate the feasibility of machine learning models characterized by preoperative hematological indicators to predict the LNM status of CC patients before surgery.

Methods: The clinical data of 236 patients with pathologically confirmed CC were retrospectively analyzed at the Gynecology Oncology Department of the First Affiliated Hospital of Bengbu Medical University from November 2020 to August 2022. The least absolute shrinkage and selection operator (LASSO) was used to select 21 features from 35 hematological indicators and for the construction of 6 machine learning predictive models, including Adaptive Boosting (AdaBoost), Gaussian Naive Bayes (GNB), and Logistic Regression (LR), as well as Random Forest (RF), Support Vector Machines (SVM), and Extreme Gradient Boosting (XGBoost). Evaluation metrics of predictive models included the area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, and F1-score.

Results: RF has the best overall predictive performance for ten-fold cross-validation in the training set. The specific performance indicators of RF were AUC (0.910, 95% confidence interval [CI]: 0.820–1.000), accuracy (0.831, 95% CI: 0.702–0.960), specificity (0.835, 95% CI: 0.708–0.962), sensitivity (0.831, 95% CI: 0.702–0.960), and F1-score (0.829, 95% CI: 0.696–0.962). RF had the highest AUC in the testing set (AUC = 0.854).

Conclusion: RF based on preoperative hematological indicators that are easily available in clinical practice showed superior performance in the preoperative prediction of CC LNM. However, investigations on larger external cohorts of patients are required for further validation of our findings.

Introduction

Cervical cancer (CC) is one of the most common gynecological malignancies, with 600,000 new cases and 340,000 deaths reported worldwide in 2020 (1). Multiple studies have demonstrated that lymph node metastasis is an important independent risk factor affecting the prognosis of patients with CC and remains the major cause of mortality in CC patients (2, 3). The 5-year overall survival rate of CC patients without LNM is 80–90%, whereas in those patients with LNM, it is reduced to 50–65% (4–6). Therefore, the 2018 International Federation of Gynecology and Obstetrics (FIGO) officially incorporated LNM into the CC staging system (7). The importance of LNM in the diagnosis, treatment decision and prognosis assessment of CC is increasing. For early-stage CC patients without LNM, radical hysterectomy is recommended (8); for CC patients with LNM, radiotherapy or chemotherapy is the recommended treatment (9). Therefore, the accurate preoperative evaluation of LNM status in CC patients is essential for treatment decisions and prognostic assessment.

Lymph nodes biopsy is the gold standard for diagnosing LNM status (10); however, it is invasive and can cause complications, such as pain and lymphedema (11). Currently, imaging examination is a conventional diagnostic method for the preoperative and noninvasive evaluation of LNM status. Common imaging examinations include computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography-CT (PET-CT) (10, 12). However, the detection of metastatic lymph nodes via CT and MRI mainly relies on morphological criteria and has relatively low sensitivity (38–56%) (13). Although PET-CT is considered the most effective method for detecting CC LNM, it has a high false-positive rate (14–16). By challenging the limits of traditional imaging examinations, emerging radiomics can further improve the accuracy of preoperative prediction of CC LNM (17, 18). However, the current research on radiomics for the preoperative prediction of CC LNM is still in its initial stages, and there is still a gap in knowledge from a practical application standpoint.

In recent years, with the development of artificial intelligence technology, machine learning (ML) has been playing an increasingly important role in the identification of LNM status in a variety of cancers, including breast cancer, kidney cancer, colon cancer, lung cancer, and cervical cancer (19–23). For example, Arezzo et al. (23) developed an Extreme Gradient Boosting (XGBoost) model based on clinical features and pelvic MRI features for the prediction of LNM in patients with advanced CC. The results of the study showed that the XGBoost model exhibited good predictive performance (89% accuracy, 83% precision, 83% recall, 0.79 AUC). Yu et al. (19) used the Random Forest (RF) algorithm to select MRI radiomics features and establish a Support Vector Machines (SVM) model for predicting axillary lymph node status in breast cancer. The results showed that the AUC of SVM in the training cohort and the external validation cohort were 0.90 and 0.91, respectively. All of the above studies show that ML models have some potential in predicting cancer LNM status.

Hematological indicators are quantifiable indicators that are clinically accessible. Previous studies have suggested associations between some hematological indicators and CC LNM. For example, increased preoperative plasma squamous cell carcinoma antigen (SCC-Ag) levels may predict an increased incidence of CC LNM (24, 25). Moreover, Gavrilescu et al. (26) demonstrated that CC patients without LNM had a significantly higher neutrophil-lymphocyte ratio (NLR) than CC patients with LNM. To our knowledge, no studies have used pure hematological indicators to build machine learning models for the preoperative prediction of LNM status in CC patients. Therefore, this study aimed to evaluate the feasibility of machine learning models characterized by preoperative hematological indicators to predict the LNM status of CC patients before surgery.

Methods

Participant characteristics

The clinical data of CC patients who were admitted to the Department of Gynecology and Oncology of the First Affiliated Hospital of Bengbu Medical University (Anhui, China) from November 2020 to April 2021 were retrospectively analyzed. The inclusion criteria were as follows: (1) patients who were first diagnosed with CC; (2) in line with the indication of CC radical surgery, radical hysterectomy and pelvic lymph node dissection were performed; and (3) patients with CC that were confirmed via postoperative pathology. The exclusion criteria were as follows: (1) patients complicated with other malignancies; and (2) patients with missing clinical and pathological data.

This retrospective study was approved by the Clinical Medical Research Ethics Committee of The First Affiliated Hospital of Bengbu Medical University (Bengbu, Anhui, China) (registration number: 2021KY010). The experiments were performed in strict accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. Written inform consent was waived by the Clinical Medical Research Ethics Committee of The First Affiliated Hospital of Bengbu Medical University.

Data collection and feature selection

Clinical features and hematological indicators were collected from the clinical data for patients with CC. Hematological indicators included routine blood indicators, routine biochemical indicators, coagulation function indicators, and tumor markers. Routine blood indicators included white blood cell (WBC), percentage of neutrophil (NEUT %), percentage of lymphocyte (LYM %), percentage of monocytes (MON %), hemoglobin (HGB), platelet large cell ratio (PLCR), etc.; routine biochemical indicators included alanine aminotransferase (ALT), aspartate aminotransferase (AST), prealbumin (PAB), total protein (TP), albumin (ALB), globulin (GLB), total cholesterol (TCHO), low-density lipoprotein (LDL), Cystatin C (Cys C), c-reactive protein (CRP), superoxide dismutase (SOD), etc.; coagulation function indicators included prothrombin time (PT), fibrinogen (FIB), D-dimer (DD), thrombin time (TT), activated partial thromboplastin time (APTT), international normal ratio of prothrombin time (PT-INR), and prothrombin activity (PTA); tumor markers included squamous cell carcinoma antigen (SCC-Ag). Various hematological indicators were measured by using a BC-6000plus automated hematology analyzer (Mindray, Shenzhen, China), a Sysmex CS5100 automatic blood coagulation analyzer (Sysmex, Kobe, Honshu Island, Japan), and an automatic biochemical analyzer (MEDATC, Shanghai, China).

Data set division and data class balance

The training set and the testing set were randomly generated in a ratio of 7:3. However, there is an imbalance of sample categories in the training dataset (21% of the samples with LNM and 79% of the samples without LNM), which can lead to a large bias in the classification results of the machine learning models (27). Currently common class balancing methods include random oversampling, random undersampling and synthetic sampling methods. Both random oversampling and random undersampling can balance the distribution of sample classes in the dataset, which is conducive to alleviating the data imbalance problem. However, random oversampling will repeat a few class samples in the dataset many times, which can easily lead to overfitting of the model; random undersampling will remove some samples in the dataset, which leads to the problem of information loss. Synthetic sampling methods are an improvement on random sampling methods, and the most classic and popular synthetic sampling method is the synthetic minority over-sampling technique (SMOTE) (28). This method can effectively reduce the overfitting of the model and enhance the generalization ability of the model by randomly constructing non-repeating samples on the connecting lines of the same few classes of samples. SMOTE can compensate for the shortcomings of random oversampling to some extent. Therefore, in this study, the SMOTE method was used to class balance the training set prior to feature selection.

Feature selection

In biological data, the performance of various machine learning classifiers depends heavily on the selection of important features. The methods of feature selection are categorized into rank-based and subset methods (29). Ranking-based feature selection methods do not depend on the performance of the algorithm, are computationally fast and less prone to overfitting, and can rank the importance of all features. Popular ranking-based methods include information gain, Fisher score, chi-square and minimum redundancy maximum relevance (30). However, ranking-based methods do not consider the joint importance of features and lack a threshold to determine the optimal number of features. Therefore, the ranking-based feature selection method was not selected for this study. Subset methods are feature selection methods that determine thresholds based on certain criteria to select the optimal subset of features (31). Popular subset-based methods include the least absolute shrinkage and selection operator (LASSO) and Recursive Feature Elimination (RFE) (32). However, RFE is a feature selection method based on a particular machine learning model (such as XGBoost, RF, and SVM). In order to avoid the influence of the basic model used for RFE on the results of the study, only the LASSO method was used for feature selection in this study.

Establishment and evaluation of machine learning models

Following the recommendations made by the Scikit-Learn developers, we used six supervised machine learning models to predict CC LNM. The six machine learning models were Adaptive Boosting (AdaBoost), Gaussian Naive Bayes (GNB), Logistic Regression (LR), RF, SVM and XGBoost.

In this study, accuracy, specificity, sensitivity, F1-score and the areas under the receiver operator characteristic curves (AUC) were used as assessment metrics to compare the performance of the models. The ten-fold cross-validation was performed in the training set, and the AUC of the ten-fold cross-validation was used as the main evaluation metric to identify the machine learning model with the best prediction performance. This study evaluates the prediction performance of six machine learning models in the testing set using the receiver operating characteristic (ROC) curves.

Python (version 3.9) was used to build and verify machine learning models. The flowchart for building and validating machine learning models was shown in Figure 1.

Figure 1

Figure 1. Flowchart for building and validating predictive models. LNM, Lymph node metastasis; SMOTE, Synthetic minority over-sampling technique; LASSO, Least absolute shrinkage and selection operator; AdaBoost, Adaptive Boosting; GNB, Gaussian Naive Bayes; LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machines; XGBoost, Extreme Gradient Boosting; AUC, Area under receiver operating characteristic curve.

Statistical analysis

There are three main types of data representation: mean ± standard deviation (SD) for normal continuous data, median [interquartile range (IQR)] for non-normal continuous data, and count (percentages) for counting data. The Shapiro-Wilk test was used to examine the normality distribution of the continuous data. For the comparison of all variables between CC patients with and without LNM, the independent sample t test and the Mann-Whitney U test were used to analyze the normal and non-normal continuous data, respectively, and the chi-square test was used for analyzing the counting data. The DeLong test was used to compare the differences between the ROC curves of the six machine learning models (33). Statistical analysis was performed by using SPSS Statistics 26.0 (IBM Corp., Chicago, Illinois, United States of America) software and MedCalc 20.1.0 (Solvusoft., Las Vegas, Nevada, United States of America) software. P values less than 0.05 (P < 0.05) were considered to be statistically significant.

Results

Participant characteristics

The clinical characteristics of the CC patients are shown in Table 1. A total of 236 patients with CC were enrolled in this study, and the mean age and body mass index (BMI) of the patients were 53.6 ± 10.5 years and 24.7 ± 3.1 kg/m², respectively. All of the CC patients were classified into two groups (LNM group, n = 49; non-LNM group, n = 187) according to the results of histopathological examinations. The results of the independent sample t test and the chi-square test showed that there were no significant differences in age, BMI, menopausal status, tubal ligation, diabetes, hypertension, histological subtypes of cervical cancer and lymphovascular space invasion between the LNM group and non-LNM group (P > 0.05). There was a significant difference in FIGO staging between the LNM group and Non-LNM group (P < 0.05).

Table 1

Table 1. Clinical characteristics of cervical cancer patients.

Table 2 shows the basic descriptive statistics of the hematological indicators in CC patients, as well as the results of the independent sample t test and the Mann-Whitney U test. In the univariate analyses, 8 hematological indicators, including SCC-Ag, DD, HGB, PAB, TP, ALB, TCHO, and LDL, were significantly different between the LNM group and the non-LNM group (P < 0.05). These results were based on the raw data analysis of 236 CC patients, whereas the feature selection was based on the processed data. Class balancing of the training data by using SMOTE resulted in 130 CC patients with LNM and 130 CC patients without LNM in the training set.

Table 2

Table 2. Association between hematologic indicators and lymph node metastasis status.

Feature selection

In this study, the LASSO feature selection technique was applied to select 21 features from the 35 features in the training dataset. Figure 2 illustrated the features selected by LASSO and their estimated coefficients. The top 21 hematological indicators in terms of coefficients (from high to low) were TCHO, SCC-Ag, DD, FIB, NEUT %, CRP, LYM %, AST, APTT, TT, TP, ALT, PLCR, GLB, PTA, WBC, MON %, PAB, SOD, HGB and Cys C. The absolute value of coefficients reflects the feature importance of hematological indicators.

Figure 2

Figure 2. Features selected by LASSO with their estimated Coefficients. SCC-Ag, squamous cell carcinoma antigen; PT-INR, International normal ratio of prothrombin time; TT, thrombin time; APTT, activated partial thromboplastin time; FIB, fibrinogen; DD, D-Dimer; WBC, white blood cell; NEUT %, percentage of neutrophil; LYM %, percentage of lymphocyte; MON %, percentage of monocytes; HGB, hemoglobin; PLCR, platelet large cell ratio; ALT, alanine aminotransferase; AST, aspartate aminotransferase; PAB, prealbumin; TP, total protein; GLB, globulin; TCHO, total cholesterol; Cys C, Cystatin C; CRP, c-reactive protein; SOD, superoxide dismutase; LASSO, Least absolute shrinkage and selection operator.

Establishment and evaluation of machine learning models

The results of ten-fold cross-validation in the training set show that the RF model outperforms the other five machine learning models (including AdaBoost, GNB, LR, SVM, and XGBoost) in all predictive indicators (Table 3). The specific performance indicators of the RF model were AUC (0.910, 95% confidence interval [CI]: 0.820–1.000) (Figure 3), accuracy (0.831, 95% CI: 0.702–0.960), specificity (0.835, 95% CI: 0.708–0.962), sensitivity (0.831, 95% CI: 0.702–0.960), and F1-score (0.829, 95% CI: 0.696–0.962).

Table 3

Table 3. Ten-fold cross-validated predictive performance of the six models in the training set.

Figure 3

Figure 3. Receiver operating characteristic (ROC) curves for ten-fold cross-validation of the Random Forest (RF) model in the training set. The blue solid line represents the average ROC curve with ten-fold cross-validation. The red diagonal line denotes an area under the ROC curve (AUC) of 0.5, which represents a random probability (P = 0.5). The shaded area around the average ROC curve reflects the 95% confidence interval. AUC, area under receiver operating characteristic curve.

Figure 4 showed the ROC curves of six machine learning models for predicting CC LNM on the testing set. Among them, RF had the highest AUC value (AUC = 0.854), which was significantly higher than the other five models (all P values < 0.05, Delong test), which was a key metric for assessing the performance of predictive models. In the testing set, the accuracy, specificity, sensitivity, F1-score, and AUC of the RF model were all above 0.8, and the RF model showed the best performance among the six machine learning algorithms (Table 4). Therefore, the RF model was determined to be the best model in this study.

Figure 4

Figure 4. Receiver operating characteristic (ROC) curves of six machine learning models for predicting lymph node metastasis of cervical cancer on the testing set. The black diagonal line denotes an area under the ROC curve (AUC) of 0.5, which represents a random probability (P = 0.5). AdaBoost, Adaptive Boosting; GNB, Gaussian Naive Bayes; LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machines; XGBoost, Extreme Gradient Boosting; AUC, Area under receiver operating characteristic curve.

Table 4

Table 4. Predictive performance of six machine learning models in the testing set.

Discussion

In this study, six machine learning models were used to predict LNM status in CC patients. The machine learning models were based on a variety of preoperative hematological indicators, including routine blood indicators, routine biochemical indicators, coagulation function indicators, and tumor markers. The results of ten-fold cross-validation showed that the overall prediction performance of the RF model was better than that of the other five models, thus indicating that the model had the best stability.

In recent years, ML techniques have been widely used to identify LNM in CC patients. For example, Liu et al. (34) collected clinical features and MRI radiomics features of 180 CC patients and established 7 ML models. The results showed that among the 7 ML models, Multinomial Naive Bayes (MNB) had the most robust predictive performance, with an AUC of 0.745, an accuracy of 0.778, and a specificity of 0.900. Compared to the present study, the model needs to be improved in terms of accuracy of prediction, and the method is more costly and time-consuming to test. Guan et al. (35) collected preoperative 5-minute electrocardiograms from 292 CC patients and developed 6 ML models based on 32 heart rate variability parameters. The results showed that among the 6 ML models, the RF model had the best predictive performance (AUC of 0.852, accuracy of 0.744, sensitivity of 0.783 and specificity of 0.785). In contrast, the RF model characterized by hematological parameters in this study showed improved AUC, accuracy, sensitivity and specificity (AUC of 0.854, accuracy of 0.817, sensitivity of 0.857 and specificity of 0.807).

To improve the interpretability of machine learning models, we used coefficients to represent the feature importance of each hematological indicator. Higher feature importance indicates that the feature is more useful for predicting CC LNM. In this study, TCHO showed the highest feature importance. Increased serum TCHO levels have been reported to be a risk factor for the development of certain cancers, and serum TCHO levels have been associated with LNM in a variety of cancers, such as esophageal cancer, gastric cancer and pancreatic cancers. Sako et al. (36) found that TCHO levels in esophageal cancer patients with LNM were significantly higher than those without LNM. Wu et al. (37) demonstrated that TCHO levels in pancreatic cancer patients were significantly correlated with tumor grade and LNM. Kitayama et al. (38) reported that patients with early gastric cancer who suffered from hypercholesterolemia (TCHO ≥ 220 mg/dl) had a significantly higher rate of LNM. It has been shown that T lymphocytes play a major role in killing malignant cells, but their activity is influenced by the tumor microenvironment. High cholesterol levels upregulate the expression of immune checkpoints in T lymphocytes, which leads to a weakening of the anti-tumor function of T cells (39). In addition, Mahmoud et al. (40) found that prostate cancer cells store cholesterol and use it as energy for growth. Therefore, it is possible that elevated levels of TCHO promote malignant tumor growth and thus malignant tumorigenesis LNM. To the best of our knowledge, there are no studies on the correlation between LNM and TCHO in CC. The results of this study confirmed that TCHO levels in CC patients were significantly correlated with LNM. However, the exact mechanism of TCHO as a predictor of LNM in CC patients is unclear and requires further study.

In this study, SCC-Ag was ranked second in terms of feature importance. SCC-Ag is a specific antigen produced by squamous cell carcinoma (SCC) that has good application value for predicting LNM in cervical cancer derived from squamous cells (41, 42). Preoperative serum SCC-Ag is the tumor marker that is commonly used to predict squamous cell CC LNM (43, 44). Previous studies have suggested that preoperative high SCC-Ag levels may be associated with CC LNM (45–47). Wei et al. (48) found that cancer-associated fibroblasts (CAFs) in patients with cervical squamous cell carcinoma impaired lymphatic endothelial barriers by activating the integrin-FAK/Src-VE-cadherin signaling pathway in lymphatic endothelial cells, thus consequently enhancing CC LNM.

In this study, coagulation function indicators (such as DD and FIB) also showed high feature importance. Previous studies have indicated that the coagulation function of patients with malignant tumors exhibit different degrees of abnormality (49–51). This may be related to tumor cells causing changes in coagulation function through various pathways to promote tumor growth, infiltration, and metastasis (52). Similarly, the hyperactivation of the coagulation system in CC patients can promote LNM development (53, 54). In this study, the univariate analysis confirmed that the DD levels of CC patients with LNM were significantly higher than those of CC patients without LNM (P = 0.003). Remarkably, in our study, hematological indicators such as TT, APTT, PT-INR, TP, and NEUT% were also confirmed to contribute to the construction of machine learning models. However, the specific mechanism of the above-mentioned indicators as predictors of LNM in CC patients is unclear, and further studies are warranted. Furthermore, it has been suggested that some hematological parameters that were not used in this study, such as sugar chain antigen 125 (CA125), sugar chain antigen 199 (CA19-9), α fetoprotein (AFP), and alkaline phosphatase (ALP), may also be associated with LNM in CC patients, which may also provide a feasible direction for future research (55).

However, this study also had some limitations. First, the present study was a retrospective analysis derived from a single- center, and a relatively small sample size was taken into account. Therefore, further validation of predictive models will need to be conducted in a larger multicenter study to establish the robustness of the current findings. Second, hematological indicators are always affected to varying degrees by testing equipment and testing reagents. Thus, hematological indicators will need to be collected under different conditions in the future to verify the generalizability of the predictive model. Third, CC often occurs in remote areas with limited medical care, leading to some difficulties in collecting the required hematological parameters (e.g., SCC-Ag). In the future, there will be a need to use fewer hematological indicators for modeling while ensuring the performance of the ML model to improve the usability of the ML model in most areas.

Conclusion

In conclusion, we used machine learning algorithms to establish six machine learning models based on preoperative hematological indicators for the preoperative prediction of LNM status in CC patients. Ten-fold cross-validation proved that the RF model had higher stability. The higher AUC values of the RF model in the testing set indicate a better generalization performance. Our results suggested that the RF model based on preoperative hematological indicators had great potential in clinical practice. Through further validation and refinement, the RF model has the potential to help develop more effective treatment plans for cervical cancer patients through preoperative diagnosis.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

Ethics statement

This retrospective study was approved by the Clinical Medical Research Ethics Committee of The First Affiliated Hospital of Bengbu Medical University (Bengbu, Anhui, China) (registration number: 2021KY010). The studies were conducted in accordance with the local legislation and institutional requirements. The human samples used in this study were acquired from primarily isolated as part of your previous study for which ethical approval was obtained. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

HZ: Formal analysis, Writing – original draft. YLW: Data curation, Writing – original draft. YS: Data curation, Writing – original draft. YQW: Writing – original draft. BS: Conceptualization, Methodology, Writing – review & editing. JL: Writing – review & editing. SZ: Conceptualization, Methodology, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by the “512” Outstanding Talents Fostering Project of Bengbu Medical University (grant number BY51201312), the Natural Science Research Project of Anhui Educational Committee (grant number 2022AH051471) and Research project of Bengbu Medical University (grant number 2021byzd057).

Acknowledgments

We thank all of the patients who provided their clinically relevant data for this study, as well as the surgical teams who facilitated this work.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2021) 71:209–49. doi: 10.3322/caac.21660

PubMed Abstract | Crossref Full Text | Google Scholar

2. Cohen PA, Jhingran A, Oaknin A, Denny L. Cervical cancer. Lancet. (2019) 393:169–82. doi: 10.1016/S0140-6736(18)32470-X

PubMed Abstract | Crossref Full Text | Google Scholar

3. Kilic C, Kimyon Comert G, Cakir C, Yuksel D, Codal B, Kilic F, et al. Recurrence pattern and prognostic factors for survival in cervical cancer with lymph node metastasis. J Obstet Gynaecol Res. (2021) 47:2175–84. doi: 10.1111/jog.14762

PubMed Abstract | Crossref Full Text | Google Scholar

4. Aoki Y, Sasaki M, Watanabe M, Sato T, Tsuneki I, Aida H, et al. High-risk group in node-positive patients with stage IB, IIA, and IIB cervical carcinoma after radical hysterectomy and postoperative pelvic irradiation. Gynecol Oncol. (2000) 77:305–9. doi: 10.1006/gyno.2000.5788

PubMed Abstract | Crossref Full Text | Google Scholar

5. Gien LT, Covens A. Lymph node assessment in cervical cancer: prognostic and therapeutic implications. J Surg Oncol. (2009) 99:242–7. doi: 10.1002/jso.21199

PubMed Abstract | Crossref Full Text | Google Scholar

6. Ruengkhachorn I, Therasakvichya S, Warnnissorn M, Leelaphatanadit C, Sangkarat S, Srisombat J, et al. Pathologic Risk Factors and Oncologic Outcomes in Early-stage Cervical Cancer Patients Treated by Radical Hysterectomy and Pelvic Lymphadenectomy at a Thai University Hospital: A 7 year Retrospective Review. Asian Pac J Cancer Prev. (2015) 16:5951–6. doi: 10.7314/APJCP.2015.16.14.5951

PubMed Abstract | Crossref Full Text | Google Scholar

7. Bhatla N, Denny L. FIGO cancer report 2018. Int J Gynaecol Obstet. (2018) 143:2–3. doi: 10.1002/ijgo.12608

PubMed Abstract | Crossref Full Text | Google Scholar

8. Hou L, Zhou W, Ren J, Du X, Xin L, Zhao X, et al. Radiomics analysis of multiparametric MRI for the preoperative prediction of lymph node metastasis in cervical cancer. Front Oncol. (2020) 10:1393. doi: 10.3389/fonc.2020.01393

PubMed Abstract | Crossref Full Text | Google Scholar

9. NCCN clinical practice guidelines in oncology: cervical cancer (2022. V1) (2022). Available online at: https://www.nccn.org/professionals/physician_gls/pdf/cervical.pdf (Accessed 15 October 2022).

Google Scholar

10. Bourgioti C, Chatoupis K, Moulopoulos LA. Current imaging strategies for the evaluation of uterine cervical cancer. World J Radiol. (2016) 8:342–54. doi: 10.4329/wjr.v8.i4.342

PubMed Abstract | Crossref Full Text | Google Scholar

11. Plante M, Renaud MC, Têtu B, Harel F, Roy M. Laparoscopic sentinel node mapping in early-stage cervical cancer. Gynecol Oncol. (2003) 91:494–503. doi: 10.1016/j.ygyno.2003.08.024

PubMed Abstract | Crossref Full Text | Google Scholar

12. Williams AD, Cousins C, Soutter WP, Mubashar M, Peters AM, Dina R, et al. Detection of pelvic lymph node metastases in gynecologic Malignancy: a comparison of CT, MR imaging, and positron emission tomography. AJR Am J Roentgenol. (2001) 177:343–8. doi: 10.2214/ajr.177.2.1770343

PubMed Abstract | Crossref Full Text | Google Scholar

13. Choi HJ, Ju W, Myung SK, Kim Y. Diagnostic performance of computer tomography, magnetic resonance imaging, and positron emission tomography or positron emission tomography/computer tomography for detection of metastatic lymph nodes in patients with cervical cancer: meta-analysis. Cancer Sci. (2010) 101:1471–9. doi: 10.1111/j.1349-7006.2010.01532.x

PubMed Abstract | Crossref Full Text | Google Scholar

14. Stecco A, Buemi F, Cassarà A, Matheoud R, Sacchetti GM, Arnulfo A, et al. Comparison of retrospective PET and MRI-DWI (PET/MRI-DWI) image fusion with PET/CT and MRI-DWI in detection of cervical and endometrial cancer lymph node metastases. Radiol Med. (2016) 121:537–45. doi: 10.1007/s11547-016-0626-5

PubMed Abstract | Crossref Full Text | Google Scholar

15. Brunette LL, Bonyadlou S, Ji L, Groshen S, Shuster D, Mehta A, et al. Predictive value of FDG PET/CT to detect lymph node metastases in cervical cancer. Clin Nucl Med. (2018) 43:793–801. doi: 10.1097/RLU.0000000000002252

PubMed Abstract | Crossref Full Text | Google Scholar

16. Kaźmierczak K, Cholewiński W, Nowakowski B. Comparison of positron emission tomography with computed tomography examination with histopathological assessment of pelvic lymph nodes in patients with cervical cancer treated surgically. Contemp Oncol (Pozn). (2021) 25:160–7. doi: 10.5114/wo.2021.109209

PubMed Abstract | Crossref Full Text | Google Scholar

17. Yu YY, Zhang R, Dong RT, Hu QY, Yu T, Liu F, et al. Feasibility of an ADC-based radiomics model for predicting pelvic lymph node metastases in patients with stage IB-IIA cervical squamous cell carcinoma. Br J Radiol. (2019) 92:20180986. doi: 10.1259/bjr.20180986

PubMed Abstract | Crossref Full Text | Google Scholar

18. Wang T, Gao T, Yang J, Yan X, Wang Y, Zhou X, et al. Preoperative prediction of pelvic lymph nodes metastasis in early-stage cervical cancer using radiomics nomogram developed based on T2-weighted MRI and diffusion-weighted imaging. Eur J Radiol. (2019) 114:128–35. doi: 10.1016/j.ejrad.2019.01.003

PubMed Abstract | Crossref Full Text | Google Scholar

19. Yu Y, He Z, Ouyang J, Tan Y, Chen Y, Gu Y, et al. Magnetic resonance imaging radiomics predicts preoperative axillary lymph node metastasis to support surgical decisions and is associated with tumor microenvironment in invasive breast cancer: A machine learning, multicenter study. EBioMedicine. (2021) 69:103460. doi: 10.1016/j.ebiom.2021.103460

PubMed Abstract | Crossref Full Text | Google Scholar

20. Feng X, Hong T, Liu W, Xu C, Li W, Yang B, et al. Development and validation of a machine learning model to predict the risk of lymph node metastasis in renal carcinoma. Front Endocrinol (Lausanne). (2022) 13:1054358. doi: 10.3389/fendo.2022.1054358

PubMed Abstract | Crossref Full Text | Google Scholar

21. Eresen A, Li Y, Yang J, Shangguan J, Velichko Y, Yaghmai V, et al. Preoperative assessment of lymph node metastasis in Colon Cancer patients using machine learning: a pilot study. Cancer Imaging. (2020) 20:30. doi: 10.1186/s40644-020-00308-z

PubMed Abstract | Crossref Full Text | Google Scholar

22. Meng N, Feng P, Yu X, Wu Y, Fu F, Li Z, et al. An [18F]FDG PET/3D-ultrashort echo time MRI-based radiomics model established by machine learning facilitates preoperative assessment of lymph node status in non-small cell lung cancer. Eur Radiol. (2024) 34:318–29. doi: 10.1007/s00330-023-09978-2

PubMed Abstract | Crossref Full Text | Google Scholar

23. Arezzo F, Cormio G, Mongelli M, Cazzato G, Silvestris E, Kardhashi A, et al. Machine learning applied to MRI evaluation for the detection of lymph node metastasis in patients with locally advanced cervical cancer treated with neoadjuvant chemotherapy. Arch Gynecol Obstet. (2023) 307:1911–9. doi: 10.1007/s00404-022-06824-6

PubMed Abstract | Crossref Full Text | Google Scholar

24. Xu D, Wang D, Wang S, Tian Y, Long Z, Ren X, et al. Correlation between squamous cell carcinoma antigen level and the clinicopathological features of early-stage cervical squamous cell carcinoma and the predictive value of squamous cell carcinoma antigen combined with computed tomography scan for lymph node metastasis. Int J Gynecol Cancer. (2017) 27:1935–42. doi: 10.1097/IGC.0000000000001112

PubMed Abstract | Crossref Full Text | Google Scholar

25. Zhu C, Zhang W, Wang X, Jiao L, Chen L, Jiang J. Predictive value of preoperative serum squamous cell carcinoma antigen level for lymph node metastasis in early-stage cervical squamous cell carcinoma. Med (Baltimore). (2021) 100:e26960. doi: 10.1097/MD.0000000000026960

Crossref Full Text | Google Scholar

26. Gavrilescu MM, Hutanu I, Ioanid N, Musina AM, Mihaela BA, Moscalu M, et al. Clinical value of hematological biomarkers in uterine cervical cancer. Chirurgia (Bucur). (2016) 111:493–9. doi: 10.21614/chirurgia.111.6.493

PubMed Abstract | Crossref Full Text | Google Scholar

27. Khodabandelu S, Ghaemian N, Khafri S, Ezoji M, Khaleghi S. Development of a machine learning-based screening method for thyroid nodules classification by solving the imbalance challenge in thyroid nodules data. J Res Health Sci. (2022) 22:e00555. doi: 10.34172/jrhs.2022.90

PubMed Abstract | Crossref Full Text | Google Scholar

28. Taft LM, Evans RS, Shyu CR, Egger MJ, Chawla N, Mitchell JA, et al. Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery. J BioMed Inform. (2009) 42:356–64. doi: 10.1016/j.jbi.2008.09.001

PubMed Abstract | Crossref Full Text | Google Scholar

29. Kim S, Halabi S. High dimensional variable selection with error control. BioMed Res Int. (2016) 2016:8209453. doi: 10.1155/2016/8209453

PubMed Abstract | Crossref Full Text | Google Scholar

30. Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. (2005) 27:1226–38. doi: 10.1109/TPAMI.2005.159

PubMed Abstract | Crossref Full Text | Google Scholar

31. Ditzler G, Morrison JC, Lan Y, Rosen GL. Fizzy: feature subset selection for metagenomics. BMC Bioinf. (2015) 16:358. doi: 10.1186/s12859-015-0793-8

Crossref Full Text | Google Scholar

32. Li W, Song Y, Chen K, Ying J, Zheng Z, Qiao S, et al. Predictive model and risk analysis for diabetic retinopathy using machine learning: a retrospective cohort study in China. BMJ Open. (2021) 11:e050989. doi: 10.1136/bmjopen-2021-050989

PubMed Abstract | Crossref Full Text | Google Scholar

33. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. (1988) 44:837–45. doi: 10.2307/2531595

PubMed Abstract | Crossref Full Text | Google Scholar

34. Liu S, Zhou Y, Wang C, Shen J, Zheng Y. Prediction of lymph node status in patients with early-stage cervical cancer based on radiomic features of magnetic resonance imaging (MRI) images. BMC Med Imaging. (2023) 23:101. doi: 10.1186/s12880-023-01059-6

PubMed Abstract | Crossref Full Text | Google Scholar

35. Guan W, Wang Y, Zhao H, Lu H, Zhang S, Liu J, et al. Prediction models for lymph node metastasis in cervical cancer based on preoperative heart rate variability. Front Neurosci. (2024) 18:1275487. doi: 10.3389/fnins.2024.1275487

PubMed Abstract | Crossref Full Text | Google Scholar

36. Sako A, Kitayama J, Kaisaki S, Nagawa H. Hyperlipidemia is a risk factor for lymphatic metastasis in superficial esophageal carcinoma. Cancer Lett. (2004) 208:43–9. doi: 10.1016/j.canlet.2003.11.010

PubMed Abstract | Crossref Full Text | Google Scholar

37. Wu B, Shen W, Wang X, Wang J, Zhong Z, Zhou Z, et al. Plasma lipid levels are associated with the CD8+ T-cell infiltration and prognosis of patients with pancreatic cancer. Cancer Med. (2023) 12:14138–48. doi: 10.1002/cam4.6080

PubMed Abstract | Crossref Full Text | Google Scholar

38. Kitayama J, Hatano K, Kaisaki S, Suzuki H, Fujii S, Nagawa H. Hyperlipidaemia is positively correlated with lymph node metastasis in men with early gastric cancer. Br J Surg. (2004) 91:191–8. doi: 10.1002/bjs.4391

PubMed Abstract | Crossref Full Text | Google Scholar

39. Zhang J, Wang YF, Wu B, Zhong ZX, Wang KX, Yang LQ, et al. Intraepithelial attack rather than intratumorally infiltration of CD8+T lymphocytes is a favorable prognostic indicator in pancreatic ductal adenocarcinoma. Curr Mol Med. (2017) 17:689–98. doi: 10.2174/1566524018666180308115705

PubMed Abstract | Crossref Full Text | Google Scholar

40. Yang W, Bai Y, Xiong Y, Zhang J, Chen S, Zheng X, et al. Potentiating the antitumour response of CD8(+) T cells by modulating cholesterol metabolism. Nature. (2016) 531:651–5. doi: 10.1038/nature17412

PubMed Abstract | Crossref Full Text | Google Scholar

41. Gaarenstroom KN, Bonfrer JM, Korse CM, Kenter GG, Kenemans P. Value of Cyfra 21-1, TPA, and SCC-Ag in predicting extracervical disease and prognosis in cervical cancer. Anticancer Res. (1997) 17:2955–8.

PubMed Abstract | Google Scholar

42. Gaarenstroom KN, Kenter GG, Bonfrer JM, Korse CM, Van de Vijver MJ, Fleuren GJ, et al. Can initial serum cyfra 21-1, SCC antigen, and TPA levels in squamous cell cervical cancer predict lymph node metastases or prognosis? Gynecol Oncol. (2000) 77:164–70. doi: 10.1006/gyno.2000.5732

PubMed Abstract | Crossref Full Text | Google Scholar

43. Feng SY, Zhang YN, Liu JG. Risk factors and prognosis of node-positive cervical carcinoma. Ai Zheng. (2005) 24:1261–6.

PubMed Abstract | Google Scholar

44. Olthof EP, van der Aa MA, Adam JA, Stalpers LJA, Wenzel HHB, van der Velden J, et al. The role of lymph nodes in cervical cancer: incidence and identification of lymph node metastases-a literature review. Int J Clin Oncol. (2021) 26:1600–10. doi: 10.1007/s10147-021-01980-2

PubMed Abstract | Crossref Full Text | Google Scholar

45. Takeda M, Sakuragi N, Okamoto K, Todo Y, Minobe S, Nomura E, et al. Preoperative serum SCC, CA125, and CA19-9 levels and lymph node status in squamous cell carcinoma of the uterine cervix. Acta Obstet Gynecol Scand. (2002) 81:451–7. doi: 10.1034/j.1600-0412.2002.810513.x

PubMed Abstract | Crossref Full Text | Google Scholar

46. Choi KH, Lee SW, Yu M, Jeong S, Lee JW, Lee JH. Significance of elevated SCC-Ag level on tumor recurrence and patient survival in patients with squamous-cell carcinoma of uterine cervix following definitive chemoradiotherapy: a multi-institutional analysis. J Gynecol Oncol. (2019) 30:e1. doi: 10.3802/jgo.2019.30.e1

PubMed Abstract | Crossref Full Text | Google Scholar

47. Guo Q, Zhu J, Wu Y, Wen H, Xia L, Wu X, et al. Predictive value of preoperative serum squamous cell carcinoma antigen (SCC-Ag) level on tumor recurrence in cervical squamous cell carcinoma patients treated with radical surgery: A single-institution study. Eur J Surg Oncol. (2020) 46:131–8. doi: 10.1016/j.ejso.2019.08.021

PubMed Abstract | Crossref Full Text | Google Scholar

48. Wei WF, Chen XJ, Liang LJ, Yu L, Wu XG, Zhou CF, et al. Periostin⁺ cancer-associated fibroblasts promote lymph node metastasis by impairing the lymphatic endothelial barriers in cervical squamous cell carcinoma. Mol Oncol. (2021) 15:210–27. doi: 10.1002/1878-0261.12837

PubMed Abstract | Crossref Full Text | Google Scholar

49. Tikhomirova I, Petrochenko E, Malysheva Y, Ryabov M, Kislov N. Interrelation of blood coagulation and hemorheology in cancer. Clin Hemorheol Microcirc. (2016) 64:635–44. doi: 10.3233/CH-168037

PubMed Abstract | Crossref Full Text | Google Scholar

50. Martinez C, Cohen AT, Bamber L, Rietbrock S. Epidemiology of first and recurrent venous thromboembolism: a population-based cohort study in patients without active cancer. Thromb Haemost. (2014) 112:255–63. doi: 10.1160/TH13-09-0793

PubMed Abstract | Crossref Full Text | Google Scholar

51. Langouo Fontsa M, Aiello MM, Migliori E, Scartozzi M, Lambertini M, Willard-Gallo K, et al. Thromboembolism and immune checkpoint blockade in cancer patients: an old foe for new research. Target Oncol. (2022) 17:497–505. doi: 10.1007/s11523-022-00908-8

PubMed Abstract | Crossref Full Text | Google Scholar

52. Falanga A, Marchetti M, Vignoli A. Coagulation and cancer: biological and clinical aspects. J Thromb Haemost. (2013) 11:223–33. doi: 10.1111/jth.12075

PubMed Abstract | Crossref Full Text | Google Scholar

53. Nakamura K, Nakayama K, Ishikawa M, Katagiri H, Minamoto T, Ishibashi T, et al. High pre-treatment plasma D-dimer level as a potential prognostic biomarker for cervical carcinoma. Anticancer Res. (2016) 36:2933–8.

PubMed Abstract | Google Scholar

54. Zhao K, Deng H, Qin Y, Liao W, Liang W. Prognostic significance of pretreatment plasma fibrinogen and platelet levels in patients with early-stage cervical cancer. Gynecol Obstet Invest. (2015) 79:25–33. doi: 10.1159/000365477

PubMed Abstract | Crossref Full Text | Google Scholar

55. Yu J, Zheng Q, Ding X, Zheng B, Chen X, Chen B, et al. Systematic re-analysis strategy of serum indices identifies alkaline phosphatase as a potential predictive factor for cervical cancer. Oncol Lett. (2019) 18:2356–65. doi: 10.3892/ol

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: cervical cancer, lymph node metastasis, machine learning, hematological indicators, preoperative prediction

Citation: Zhao H, Wang Y, Sun Y, Wang Y, Shi B, Liu J and Zhang S (2024) Hematological indicator-based machine learning models for preoperative prediction of lymph node metastasis in cervical cancer. Front. Oncol. 14:1400109. doi: 10.3389/fonc.2024.1400109

Received: 13 March 2024; Accepted: 29 July 2024;
Published: 13 August 2024.

Edited by:

Giuseppe Vizzielli, University of Udine, Italy

Reviewed by:

Cristina Taliento, University of Ferrara, Italy
Veronica Tius, University of Udine, Italy

Copyright © 2024 Zhao, Wang, Sun, Wang, Shi, Liu and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Sai Zhang, c3poYW5nQGJibXUuZWR1LmNu; Jian Liu, ZWxpdGVsakAxMjYuY29t

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.