Machine learning prediction models based on LogisticRegression, XGBoost, GaussianNB, and LGBMClassifier for patients in the prostate-specific antigen gray zone are to be developed and compared, identifying valuable predictors. Predictive models are to be integrated into actual clinical decisions.
Patient information was collected from December 01, 2014 to December 01, 2022 from the Department of Urology, The First Affiliated Hospital of Nanchang University. Patients with a pathological diagnosis of prostate hyperplasia or prostate cancer (any PCa) and having a prostate-specific antigen (PSA) level of 4–10 ng/mL before prostate puncture were included in the initial information collection. Eventually, 756 patients were selected. Age, total prostate-specific antigen (tPSA), free prostate-specific antigen (fPSA), fPSA/tPSA, prostate volume (PV), prostate-specific antigen density (PSAD), (fPSA/tPSA)/PSAD, and the prostate MRI results of these patients were recorded. After univariate and multivariate logistic analyses, statistically significant predictors were screened to build and compare machine learning models based on LogisticRegression, XGBoost, GaussianNB, and LGBMClassifier to determine more valuable predictors.
Machine learning prediction models based on LogisticRegression, XGBoost, GaussianNB, and LGBMClassifier exhibit higher predictive power than individual metrics. The area under the curve (AUC) (95% CI), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score of the LogisticRegression machine learning prediction model were 0.932 (0.881–0.983), 0.792, 0.824, 0.919, 0.652, 0.920, and 0.728, respectively; of the XGBoost machine learning prediction model were 0.813 (0.723–0.904), 0.771, 0.800, 0.768, 0.737, 0.793 and 0.767, respectively; of the GaussianNB machine learning prediction model were 0.902 (0.843–0.962), 0.813, 0.875, 0.819, 0.600, 0.909, and 0.712, respectively; and of the LGBMClassifier machine learning prediction model were 0.886 (0.809–0.963), 0.833, 0.882, 0.806, 0.725, 0.911, and 0.796, respectively. The LogisticRegression machine learning prediction model has the highest AUC among all prediction models, and the difference between the AUC of the LogisticRegression prediction model and those of XGBoost, GaussianNB, and LGBMClassifier is statistically significant (p < 0.001).
Machine learning prediction models based on LogisticRegression, XGBoost, GaussianNB, and LGBMClassifier algorithms exhibit superior predictability for patients in the PSA gray area, with the LogisticRegression model yielding the best prediction. The aforementioned predictive models can be used for actual clinical decision-making.