Postherpetic itch (PHI) is an easily overlooked complication of herpes zoster that greatly affects patients' quality of life. Studies have shown that early intervention can reduce the occurrence of itch. The aim of this study was to develop and validate a predictive model through a machine learning approach to identify patients at risk of developing PHI among patients with herpes zoster, making PHI prevention a viable clinical option.
We conducted a retrospective review of 488 hospitalized patients with herpes zoster at The First Affiliated Hospital of Zhejiang Chinese Medical University and classified according to whether they had PHI. Fifty indicators of these participants were collected as potential input features for the model. Features associated with PHI were identified for inclusion in the model using the least absolute shrinkage selection operator (LASSO). Divide all the data into five pieces, and then use each piece as a verification set and the others as a training set for training and verification, this process is repeated 100 times. Five models, logistic regression, random forest (RF), k-nearest neighbor, gradient boosting decision tree and neural network, were built in the training set using machine learning methods, and the performance of these models was evaluated in the test set.
Seven non-zero characteristic variables from the Lasso regression results were selected for inclusion in the model, including age, moderate pain, time to recovery from rash, diabetes, severe pain, rash on the head and face, and basophil ratio. The RF model performs better than other models. On the test set, the AUC of the RF model is 0.84 [(95% confidence interval (CI): 0.80–0.88], an accuracy of 0.78 (95% CI: 0.69–0.86), a precision of 0.61 (95% CI: 0.45–0.77), a recall of 0.73 (95% CI: 0.58–0.89), and a specificity of 0.79 (95% CI: 0.70–0.89).
In this study, five machine learning methods were used to build postherpetic itch risk prediction models by analyzing historical case data, and the optimal model was selected through comparative analysis, with the random forest model being the top performing model.