Ovarian cancer (OC) is the most lethal gynecological malignancy, with limited early screening methods and poor prognosis. Artificial intelligence technology has made a great breakthrough in cancer diagnosis.
We aim to develop a specific interpretable machine learning (ML) prediction model for the diagnosis and prognosis of epithelial ovarian cancer (EOC) based on a variety of biomarkers.
A total of 521 patients with EOC and 144 patients with benign gynecological diseases were enrolled including derivation datasets and an external validation cohort. The predicted information was acquired by 9 supervised ML methods, through 34 parameters. Behind predicted reasons for the best ML were improved by using the SHapley Additive exPlanations (SHAP) algorithm. In addition, the prognosis of EOC was analyzed by unsupervised clustering and Kaplan–Meier (KM) survival analysis.
ML technology was superior to conventional logistic regression in predicting EOC diagnosis and XGBoost performed best in the external validation datasets. The AUC values of distinguishing EOC and benign disease patients, determining pathological type, grade and clinical stage were 0.958 (0.926-0.989), 0.792 (0.701-0.8834), 0.819 (0.687-0.950) and 0.68 (0.573-0.788) respectively. For negative CA-125 EOC patients, the AUC performance of XGBoost model was 0.835(0.763-0.907). We used unsupervised cluster analysis to identify EOC subgroups with significantly poor overall survival (p-value <0.0001) and recurrence-free survival (p-value <0.0001).
Based on the preoperative characteristics, we proved that ML algorithm can provide an acceptable diagnosis and prognosis prediction model for EOC patients. Meanwhile, SHAP analysis can improve the interpretability of ML models and contribute to precision medicine.