The shoulder-hand syndrome (SHS) severely impedes the function recovery process of patients after stroke. It is incapable to identify the factors at high risk for its occurrence, and there is no effective treatment. This study intends to apply the random forest (RF) algorithm in ensemble learning to establish a predictive model for the occurrence of SHS after stroke, aiming to identify high-risk SHS in the first-stroke onset population and discuss possible therapeutic methods.
We retrospectively studied all the first-onset stroke patients with one-side hemiplegia, then 36 patients that met the criteria were included. The patients’ data concerning a wide spectrum of demographic, clinical, and laboratory data were analyzed. RF algorithms were built to predict the SHS occurrence, and the model’s reliability was measured with a confusion matrix and the area under the receiver operating curves (ROC).
A binary classification model was trained based on 25 handpicked features. The area under the ROC curve of the prediction model was 0.8 and the out-of-bag accuracy rate was 72.73%. The confusion matrix indicated a sensitivity of 0.8 and a specificity of 0.5, respectively. And the feature importance scored the weights (top 3 from large to small) in the classification were D-dimer, C-reactive protein, and hemoglobin.
A reliable predictive model can be established based on post-stroke patients’ demographic, clinical, and laboratory data. Combining the results of RF and traditional statistical methods, our model found that D-dimer, CRP, and hemoglobin affected the occurrence of the SHS after stroke in a relatively small sample of data with tightly controlled inclusion criteria.