PSA is currently the most commonly used screening indicator for prostate cancer. However, it has limited specificity for the diagnosis of prostate cancer. We aim to construct machine learning-based models and enhance the prediction of prostate cancer.
The data of 551 patients who underwent prostate biopsy were retrospectively retrieved and divided into training and test datasets in a 3:1 ratio. We constructed five PCa prediction models with four supervised machine learning algorithms, including tPSA univariate logistic regression (LR), multivariate LR, decision tree (DT), random forest (RF), and support vector machine (SVM). The five prediction models were compared based on model performance metrics, such as the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, calibration curve, and clinical decision curve analysis (DCA).
All five models had good calibration in the training dataset. In the training dataset, the RF, DT, and multivariate LR models showed better discrimination, with AUCs of 1.0, 0.922 and 0.91, respectively, than the tPSA univariate LR and SVM models. In the test dataset, the multivariate LR model exhibited the best discrimination (AUC=0.918). The multivariate LR model and SVM model had better extrapolation and generalizability, with little change in performance between the training and test datasets. Compared with the DCA curves of the tPSA LR model, the other four models exhibited better net clinical benefits.
The results of the current retrospective study suggest that machine learning techniques can predict prostate cancer with significantly better AUC, accuracy, and net clinical benefits.