Explainable machine learning framework to predict the risk of work-related neck and shoulder musculoskeletal disorders among healthcare professionals

Luo, Na; Xu, Xinyi; Jiang, Biling; Zhang, Zeyuan; Huang, Jingyu; Zhang, Xiulan; Tan, Qiong; Wang, Xuanyi; Bai, Siyi; Liu, Suyi; Pan, Yishuang; Tang, Chi; Zhu, Pinghua

doi:10.3389/fpubh.2024.1414209

ORIGINAL RESEARCH article

Front. Public Health, 20 August 2024

Sec. Occupational Health and Safety

Volume 12 - 2024 | https://doi.org/10.3389/fpubh.2024.1414209

Explainable machine learning framework to predict the risk of work-related neck and shoulder musculoskeletal disorders among healthcare professionals

Na Luo¹^†

Xinyi Xu²^†

Biling Jiang¹^†

Zeyuan Zhang³

Jingyu Huang²

Xiulan Zhang²

Qiong Tan²

Xuanyi Wang²

Siyi Bai²

Suyi Liu²

Yishuang Pan²

Chi Tang¹

Pinghua Zhu²^*

¹Health Education and Promotion Department, Nanning Center for Disease Control and Prevention, Nanning, China
²College of Humanities and Social Sciences, Guangxi Medical University, Nanning, China
³Orthopedics Department, Nanning Hospital of Traditional Chinese Medicine, Nanning, China

Objective: This study aims to develop risk prediction models for neck and shoulder musculoskeletal disorders among healthcare professionals.

Methods: A stratified sampling method was employed to select employees from medical institutions in Nanning City, yielding 617 samples. The Boruta algorithm was used for feature selection, and various models, including Tree-Based Models, Single Hidden-Layer Neural Network Models (MLP), Elastic Net Models (ENet), and Support Vector Machines (SVM), were applied to predict the selected variables, utilizing SHAP algorithms for individual-level local explanations.

Results: The SVM model excels in both Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) and exhibits more stable performance when generalizing to unseen data. The Random Forest model exhibited relatively high overall performance on the training set. The MLP model emerges as the most consistent and accurate in predicting shoulder musculoskeletal disorders, while the SVM model shows strong fitting capabilities during the training phase, with occupational factors identified as the main contributors to WMSDs.

Conclusion: This study successfully constructs work-related musculoskeletal disorder risk prediction models for healthcare professionals, enabling a quantitative analysis of the impact of occupational factors. This advancement is beneficial for future economical and convenient work-related musculoskeletal disorder screening in healthcare professions.

Contributions to the literature

• Our study employs the Boruta algorithm for feature selection, reducing neck musculoskeletal disorder screening to just 12 key items and shoulder disorder screening to 17, enabling a simplified screening process. By inputting demographic data into an electronic system, the musculoskeletal disorder prediction model can assess the risk of these conditions in healthcare professionals, thereby significantly reducing the workload for screening.

• We have developed interpretable models for predicting the risk of shoulder and neck musculoskeletal disorders, utilizing SHAP algorithms for individual-level local explanations.

• Machine learning models show better prediction accuracy and precision compared to untrained Logistic regression, which was more commonly used in past research. Studies using machine learning for predicting musculoskeletal disorders in populations are relatively scarce.

1 Introduction

Work-related musculoskeletal disorders (WMSDs) refer to injuries or disorders of the muscles, nerves, tendons, joints, cartilage, and spinal disks that are associated with exposure to risk factors in the workplace (1). According to the data on work-related musculoskeletal disorders (WMSDs) from 2018 to 2020 published by the Chinese Center for Disease Control and Prevention, there are three high-prevalence groups in China: flight attendants, medical staff, and workers in vegetable greenhouses. Medical staff, in particular, are a high-risk group for WMSDs due to their heavy workloads accompanied by poor dynamic loads, static loads, physical loads, and ergonomic environments (2). Current research has revealed that WMSDs among medical staff are most commonly observed in the shoulder, neck, and lower back (3), with the highest prevalence occurring in the lower neck region (4).

Previous studies have predominantly utilized descriptive statistical analysis and logistic regression to analyze the influencing factors of musculoskeletal disorders among medical staff in terms of dynamic loading, static loading, physical loading, repetitive motion, ergonomic environment, and labor organization. Wang et al. employed logistic regression to analyze a sample of 1,017 medical staff in the department of obstetrics and gynecology and found that individual, postural, work-environmental, as well as psychosocial factors were the main contributors to musculoskeletal disorders (5). Krishnan et al. discovered that musculoskeletal disorders were associated with age, low education level, female gender, years of working experience, and lifestyle (6). Machine learning models have demonstrated significant advantages, such as high accuracy and resistance to overfitting. Consequently, they have been widely applied in predicting chronic diseases, infectious diseases, and tumors. However, the utilization of machine learning models in the study of work-related musculoskeletal disorders (WMSDs) remains relatively limited.

Considering these research gaps, we utilized data on Work-Related Musculoskeletal Disorders (WMSDs) from healthcare professionals in Nanning, Guangxi Zhuang Autonomous Region, to construct risk prediction models for shoulder and neck WMSDs. This approach quantitatively reveals the varying degrees of influence each variable has on the risk of developing work-related musculoskeletal disorders, A web calculator for the neck and shoulder disease risk of WMSDs was constructed based on shinyapps.io, which can be applied to the early detection and prevention of neck and shoulder WMSDs in healthcare workers. Risk prediction model for neck WMSDs website is: https://shoulderwmsdspred.shinyapps.io/neck/. Risk prediction model for shoulder WMSDs website of shoulder is: https://shoulderwmsdspred.shinyapps.io/shoulder/.

2 Methods

2.1 Setting and participants

This study, funded by the Health Commission of Nanning, was conducted as part of a survey on musculoskeletal disorders among occupational populations in Nanning. The research was carried out from June 2022 to March 2023. Medical personnel from medical institutions in seven districts and five counties of Nanning were selected as the study participants using stratified sampling. The survey was conducted online using the QuestionStar platform, and 617 medical personnel from three tertiary hospitals, seven secondary hospitals, and three disease control centers participated by completing the questionnaires.

2.2 Research tools

The questionnaire comprised four sections: personal information, musculoskeletal disorder status, work stress, and occupational health literacy. The Cronbach’s Alpha for this questionnaire is 0.741.

The musculoskeletal disorder status was assessed using Chinese version of the “Musculoskeletal Disorder Questionnaire” provided by the Occupational Health and Poison Control Institute of the Chinese Center for Disease Control and Prevention, a tool developed by referring to the musculoskeletal disorder survey forms in Nordic countries and adapted to the Chinese context (7). The survey assessed musculoskeletal disorders in nine areas: neck, shoulders, back, elbows, waist, wrists, hips, knees, and ankles/feet. The respondents reported neck and shoulder WMSDs occurrences during the last 12 months, which were used as dependent variables to construct neck and shoulder WMSDs predictive models.

The work stress scale utilized in this study was the Q17 Stress Test, which is widely applied to assess work stress in hospitals (8).

For evaluating occupational health literacy, the 2021 National Health Commission’s National Key Industry Occupational Health Literacy Monitoring Questionnaire was employed. A correct response rate of 60% was considered as having adequate occupational health literacy.

2.3 Ethical consideration

This study obtained approval from the Ethics Committee of Guangxi Medical University (approval number: 2021002). The purpose and content of the research were explained to all participants, and informed consent was obtained from each of them.

2.4 Machine learning model workflow

2.4.1 Refining variables with the Boruta algorithm

The Boruta algorithm represents an approach for feature selection, particularly well-suited to address feature selection challenges within machine learning tasks. Its primary objective lies in the identification of the most pivotal attributes from a dataset teeming with numerous features, thereby bolstering model performance while mitigating the risk of overfitting.

As indicated in Figure 1, it becomes evident that several variables exhibit pronounced interrelationships. In light of this observation, this research segregates the dataset into training and testing subsets at a 3:1 ratio. Subsequently, the target variables, namely the presence of neck and shoulder musculoskeletal disorders, are employed to train machine learning algorithms. Leveraging the Boruta algorithm, we undertake a rigorous examination of feature variables, culling those that bear no meaningful contribution to the model. Ultimately, this process yields 12 independent variables for the “Neck” category and 17 independent variables for the “Shoulder” category, as elaborated in Figure 2 and Table 1.

Figure 1

Figure 1. Heatmap illustrating the correlations between different variables. (A,B) Respectively present the variable correlations for musculoskeletal disorders in the neck and shoulder regions.

Figure 2

Figure 2. Data variable selection based on the Boruta algorithm. (A,B) Respectively depict the variable selection outcomes for the risk prediction dataset of neck musculoskeletal disorders and shoulder musculoskeletal disorders.

Table 1

Table 1. Variables of the musculoskeletal disorder risk prediction model.

2.4.2 Robustness assessment of models

We conducted a comparative analysis encompassing four distinct model categories: (1) Tree-Based Models: This category includes decision tree models, random forest models (RF), and XGBoost models (Xgboost). (2) Single Hidden-Layer Neural Network Models (MLP). The multilayer perceptron consists of multiple layers of neurons, where each layer is connected to the preceding layer, receiving its inputs. Simultaneously, each layer is also connected to the subsequent layer, influencing the neurons within the current layer. These layers include the input layer, hidden layer, and output layer. In this study, the MLP employed a single hidden layer comprising 15 hidden units. (3) Elastic Net Models (ENet). (4) Support Vector Machines (SVM). For each of these model categories, we performed an extensive hyperparameter grid search through 5-fold cross-validation on the training dataset (refer to Figure 3) (9). Subsequently, we evaluated model performance on both the training and testing datasets using metrics such as mean absolute error (MAE), root mean square error (RMSE), accuracy, and other relevant indicators.

Figure 3

Figure 3. The optimal hyperparameter cross-validation results for machine learning models. The subplots in (A), from left to right, and from the first row to the second row, represent the optimal hyperparameter cross-validation results for neck musculoskeletal disorder prediction models for RF, SVM, Enet, MLP, and Xgboost, respectively. The subplots in (B), from left to right, and from the first row to the second row, represent the optimal hyperparameter cross-validation results for shoulder musculoskeletal disorder prediction models for RF, SVM, Enet, MLP, and Xgboost, respectively. The horizontal axis is sensitivity, and the vertical axis is 1-specificity.

2.4.3 Model interpretability

We employ the SHapley Additive exPlanations (SHAP) framework as our chosen method for model interpretability. In this context, we utilize the R programming language and leverage both the “fastshap” and “shapviz” packages (10, 11). These tools allow us to construct beeswarm plots and waterfall plots, respectively. The bee swarm plots show the distribution of the SHAP values for each feature across all the data points, and the waterfall plots are individualized explanations of a single prediction, showing the contribution of each feature to the final prediction (see Figures 4, 5).

Figure 4

Figure 4. The beeswarm plot and waterfall plot for neck musculoskeletal disorders. In (A), from the first row to the second row, and from left to right, the subplots represent the neck musculoskeletal disorders beeswarm plots for RF, SVM, Xgboost, Enet, and MLP, respectively. In (B), from the first row to the second row, and from left to right, the subplots represent the neck musculoskeletal disorders waterfall plots for RF, SVM, Xgboost, Enet, and MLP, respectively.

Figure 5

Figure 5. The beeswarm plot and waterfall plot for shoulder musculoskeletal disorders. In (A), from the first row to the second row, and from left to right, the subplots represent the shoulder musculoskeletal disorders beeswarm plots for RF, SVM, Xgboost, Enet, and MLP, respectively. In (B), from the first row to the second row, and from left to right, the subplots represent the shoulder musculoskeletal disorders waterfall plots for RF, SVM, Xgboost, Enet, and MLP, respectively.

The Shapley value represents the average marginal contribution of a variable across all conceivable coalitions. For each individual, the SHAP value associated with each variable reflects its contribution to the individual’s risk of musculoskeletal disorders in the neck and shoulder. The determination of an individual’s susceptibility to neck and shoulder musculoskeletal disorders is achieved by summing the contributions of these variables relative to the baseline value (which corresponds to the average predicted age across the dataset).

2.4.4 Partial dependency computation

The computation and graphical representation of partial dependency values for each variable are showcased in Figures 6, 7, offering illustrative examples.

Figure 6

Figure 6. Bias plot of important factors for neck musculoskeletal disorders.

Figure 7

Figure 7. Bias plot of important factors for shoulder musculoskeletal disorders.

3 Results

3.1 Demographic data

The surveyed medical personnel consisted of 403 females and 214 males (see Table 2). Among them, 419 were married, 173 were unmarried, and 25 had unknown marital status. Regarding age distribution, 244 medical personnel were between 25 and 34 years old, 194 were between 35 and 44 years old, and 102 were between 45 and 54 years old. In terms of educational background, 42 respondents had education levels below a university degree, 527 had completed college or undergraduate studies, and 48 had completed postgraduate studies or above. Regarding work experience, 215 medical personnel had been in the profession for 15 years or more. Self-assessment of health status revealed that 379 individuals rated their health as average, while 216 rated it as good. As for monthly income, 201 medical personnel earned between 3,000 and 4,999 yuan, and 190 earned between 5,000 and 6,999 yuan. In terms of the size of their employing institutions, 376 medical personnel worked in units with 300–999 employees. Night shifts were part of the work schedule for 280 medical personnel. Additionally, 214 individuals had a weekly working time of 40 h or less, and 402 had no more than two types of chronic diseases.

Table 2

Table 2. Basic demographic characteristics of survey participants.

3.2 Model performance comparison

The calibration curve of the model illustrates the degree of calibration in predicting probabilities on both the training and testing datasets. An ideal calibration model would exhibit a curve closely aligned with the diagonal line running from the lower-left corner to the upper-right corner. As the calibration curve approaches this diagonal line, the model’s probability predictions become more accurate. Performance varies among risk prediction models for different musculoskeletal disorders affecting the neck. The random forest model shows a relatively significant deviation from the ideal diagonal line on the training set, suggesting potential overfitting to the training data. On the other hand, the support vector machine exhibits a curve on the training set that is closer to the ideal diagonal line, indicating more accurate probability predictions. XGBoost demonstrates good calibration on the training data but appears to overestimate probabilities on the testing data. The calibration curve on the testing set for the elastic net model suggests a degree of miscalibration in predicting neck diseases. Although the MLP model exhibits strong calibration on the training data, its calibration performance on the testing data is comparatively subpar (see Figure 8).

Figure 8

Figure 8. The calibration curves for machine learning models predicting the risk of neck musculoskeletal disorders. The first column shows the calibration curves for the training data, and the second column shows the calibration curves for the testing data. From the first row to the fifth row, the calibration curves for RF, SVM, Xgboost, Enet, and MLP are displayed for both the training and testing data.

The performance of various risk prediction models for different shoulder musculoskeletal disorders varies. The RF model exhibits a certain degree of deviation from the ideal diagonal line in both the training and testing calibration curves. This suggests some inconsistency between the model’s predicted probabilities and the actual occurrence frequencies. The SVM model displays a calibration curve close to the ideal state on the training set, indicating relatively accurate probability predictions during the training phase. However, the calibration curve on the testing set deviates slightly, indicating that the model’s predicted probabilities may be too high or too low when dealing with new data. The XGBoost model demonstrates good probability calibration on the training set, with a calibration curve that closely aligns with the ideal diagonal line. On the testing set, although there is a slight deviation in the curve, the overall performance remains relatively robust. The ENet model exhibits low predicted probabilities on both datasets, as evidenced in the histograms, where a significant portion of predicted probabilities clusters in the lower probability value range. Regarding the MLP model, the calibration curve on the training set indicates strong probability calibration. However, on the testing set, the curve deviates slightly from the ideal diagonal line, suggesting the possibility of mild overfitting. The histograms reveal a more even distribution of predicted probabilities on the training set but a somewhat more concentrated distribution on the testing set (see Figure 9).

Figure 9

Figure 9. The calibration curves for machine learning models predicting the risk of shoulder musculoskeletal disorders. The first column shows the calibration curves for the training data, and the second column shows the calibration curves for the testing data. From the first row to the fifth row, the calibration curves for RF, SVM, Xgboost, Enet, and MLP are displayed for both the training and testing data.

Among the risk prediction model for neck musculoskeletal disorders, the SVM model achieves the lowest average MAE of 0.9165, indicating the smallest average prediction error. Following closely are the MLP and RF models, with average MAE values of 0.9850 and 0.9855, respectively. The Xgboost model has a slightly higher average MAE of 0.9950. The ENet model exhibits the highest average MAE at 0.9990.

Similarly, the SVM model attains the lowest average RMSE of 1.0385, signifying its superior performance when considering penalties for larger errors. The MLP model follows with an average RMSE of 1.0940, ranking second. ENet, Xgboost, and RF models display similar RMSE values of 1.1010, 1.1035, and 1.1045, respectively.

Among these models, the SVM model excels in both MAE and RMSE, indicating its relatively high predictive accuracy, especially in handling larger prediction errors. The MLP model performs well in RMSE but slightly lags behind the SVM in MAE. ENet, XGBoost, and RF models exhibit comparable performance in both metrics but fall slightly short of SVM and MLP.

In the risk prediction model for shoulder musculoskeletal disorders, The MLP (Multilayer Perceptron) model shows the best performance on both the training set (MAE = 0.946) and the testing set (MAE = 0.954), indicating its predictions are closest to the actual values on average. The XGBoost model follows closely with MAE = 0.974 on the training set and MAE = 0.982 on the testing set, suggesting slightly less accurate predictions than MLP but still outperforming other models. The SVM and ENet models have identical MAE on the training set (MAE = 1.001) and very similar performance on the testing set (SVM MAE = 1.009, ENet MAE = 1.007), which are moderate compared to MLP and XGBoost. The RF (Random Forest) model exhibits the highest MAE, particularly on the testing set (MAE = 1.111), which implies less accurate predictions on average compared to the other models.

The MLP model stands out as the most consistent and accurate model for predicting shoulder musculoskeletal disorders according to both MAE and RMSE metrics. XGBoost also performs well and could be considered a good alternative, especially if computational efficiency is a concern, as gradient boosting can be more computationally intensive than neural networks depending on the implementation and dataset size. The SVM and ENet models show moderate performance, while the RF model might require further parameter tuning or feature engineering to improve its prediction accuracy (see Figure 10).

Figure 10

Figure 10. The MAE and RMSE values for various machine learning models. (A) Depicts the MAE and RMSE values for neck musculoskeletal disorders, while (B) illustrates the MAE and RMSE values for shoulder musculoskeletal disorders.

When evaluating various machine learning models for predicting neck musculoskeletal disorders, we observed that conventional logistic regression model performs relatively average. The Random Forest model exhibited relatively high overall performance on the training set (accuracy = 0.703, sensitivity = 0.749, specificity = 0.667, AUC = 0.772). However, on the testing set, the SVM model outperformed with an accuracy of 0.574 and an AUC of 0.623. This suggests that while the Random Forest model demonstrates strong learning capabilities during the training phase, the SVM model exhibits more stable performance when generalizing to unseen data (see Table 3).

Table 3

Table 3. Comparison of model performance for neck musculoskeletal disorder prediction.

For the prediction of shoulder musculoskeletal disorders, the conventional logistic regression model performs relatively average. The SVM model demonstrates the best performance on the training set (accuracy = 0.781, sensitivity = 0.802, specificity = 0.768, AUC = 0.866). On the testing set, the MLP model achieves the highest accuracy (0.690), while the Xgboost model has the highest AUC value (0.734). This suggests that the SVM model exhibits strong fitting capabilities to the data during the training phase, but on the testing set, the MLP and Xgboost models provide better generalization. Particularly, the MLP model exhibits higher specificity (0.713) on the testing set, indicating its good performance in reducing false positives (see Table 4).

Table 4

Table 4. Comparison of model performance for shoulder musculoskeletal disorder prediction.

3.3 Interpretability of machine learning models for the risk of musculoskeletal disorders

To quantitatively delineate the contribution of each variable in predicting the risk of musculoskeletal disorders of the neck, our investigation primarily focuses on the application of the Shapley Additive Explanations (SHAP) framework within the Random Forest (RF) and Support Vector Machine (SVM) models. The RF model elucidates the top six determinants impacting the susceptibility of Healthcare Professionals to musculoskeletal disorders: prolonged forward neck posture, wrist flexion or maintenance of this position for extended periods, physical exhaustion post-work, prolonged neck twisting posture, static posture maintenance, and prolonged sedentary work. The SVM model reveals a similar hierarchy of influential factors, albeit with slight variations in their order. The results of conventional logistic regression (LR) are shown in Table 5, but since the performance of LR is inferior to that of random forest (RF) and support vector machine (SVM), it is not discussed in detail. Notably, the sustained forward tilt of the wrist significantly augments the risk of neck-related musculoskeletal disorders. Conversely, prolonged sitting and maintaining a uniform posture while working exhibit a negative correlation with the risk of developing these disorders (refer to Figure 4).

Table 5

Table 5. Neck musculoskeletal disorder binary logistic regression results.

To quantitatively exhibit the contribution of each variable in the prediction of shoulder musculoskeletal disorder risks, we primarily examine the outcomes of the Shapley Additive Explanations (SHAP) tree framework on the Multilayer Perceptron (MLP) and Support Vector Machine (SVM) models. The MLP model identifies the six principal factors affecting the risk among Healthcare Professionals: prolonged forward neck posture, prolonged sedentary work, work-related stress levels, number of chronic diseases, physical exhaustion post-work, and absence due to illness. Conversely, the SVM model highlights the top six influential factors as prolonged neck twisting posture, number of chronic diseases, sustained bending posture, absence due to illness, physical exhaustion post-work, and wrist flexion or maintenance of this position for extended periods. The results of conventional logistic regression (LR) are shown in Table 6, but since the performance of LR is inferior to that of multilayer perceptron (MLP) and support vector machine (SVM), it is not discussed in detail. Notably, low levels of work stress and not sitting for prolonged durations have a negative impact on the risk of lumbar musculoskeletal disorders. In contrast, maintaining a prolonged forward neck posture significantly increases the risk of shoulder musculoskeletal disorders (refer to Figure 5).

Table 6

Table 6. Shoulder musculoskeletal disorder binary logistic regression results.

Healthcare professionals who maintain a prolonged forward neck posture face a higher risk of developing neck musculoskeletal disorders. Similarly, those with extended periods of wrist flexion are more likely to suffer from these disorders. Medical staff experiencing varying degrees of tiredness post-work—from slightly tired to extremely exhausted—are more susceptible to neck musculoskeletal disorders. Additionally, a long-term neck twisting posture and prolonged periods of sitting while working significantly increase the likelihood of these conditions (refer to Figure 6).

Results from the SVM and MLP models indicate that healthcare professionals who frequently maintain a forward neck posture are at a greater risk of shoulder musculoskeletal disorders. Similarly, prolonged sitting while working elevates the risk of these disorders. Moderate to high levels of work-related stress are more likely to lead to shoulder musculoskeletal disorders in medical staff. Those with one or more types of chronic diseases face a heightened risk of developing these conditions. Experiencing tiredness or extreme fatigue after work increases the likelihood of these disorders, as does a history of absenteeism due to illness. Moreover, maintaining a prolonged neck twisting posture, sustaining a significant bending posture for extended periods, and long-term wrist flexion are all associated with an increased risk of shoulder musculoskeletal disorders (see Figure 7).

4 Discussion

Different models exhibit varying performances in assessing the risk of shoulder and neck musculoskeletal disorders, each with unique strengths and limitations. For instance, while the Random Forest excel in training datasets for predicting neck musculoskeletal disorder risks, the SVM demonstrate superior generalization abilities on test datasets. These findings emphasize the importance of considering performance metrics when selecting models for specific medical prediction tasks, especially in clinical applications where a model’s generalizability and its ability to reduce misdiagnosis (through high specificity) are crucial. Future research could explore these models’ performances on larger and more diverse datasets and refine their parameter settings, offering deeper insights for effective clinical prediction of musculoskeletal diseases.

Many studies using machine learning models lack interpretability (12–14), making it challenging to verify their reliability. Interpretability supports the acceptability of evidence and facilitates data-driven, personalized healthcare management. To achieve this, we have developed interpretable models for predicting the risk of shoulder and neck musculoskeletal disorders, utilizing SHAP algorithms for individual-level local explanations. Past studies have emphasized occupational factors as the main contributors to WMSDs, where muscle activity and movement during occupational tasks can lead to their occurrence. This finding is consistent with the results of this study, where the most critical influencing factors for neck and shoulder musculoskeletal disorders were occupational factors (15–20).

Currently, the predominant method for musculoskeletal disorder screening in China utilizes the Chinese version of the “Musculoskeletal Disorder Questionnaire” provided by the Occupational Health and Poison Control Institute of the Chinese Center for Disease Control and Prevention. This comprehensive questionnaire, consisting of 133 items requiring 5–10 min to complete, was modified by Dong et al. into the Chinese Musculoskeletal Questionnaire (CMQ) (21), which includes five major categories and 48 items. However, it remains time-consuming for occupational screening. Our study employs the Boruta algorithm for feature selection, reducing neck musculoskeletal disorder screening to just 12 key items and shoulder disorder screening to 17, Enabling a simplified screening process to identify individuals at higher risk of musculoskeletal disorders. By inputting demographic data into an electronic system, the musculoskeletal disorder prediction model can assess the risk of these conditions in healthcare professionals, thereby significantly reducing the workload for screening.

It is noteworthy that the forward posture of the neck in healthcare professionals significantly contributes to the risk of musculoskeletal disorders in both the neck and shoulder regions. Providing ergonomic chairs are recommended. Zhang et al. found that factors influencing sonographer’s physicians’ musculoskeletal disorders include work duration, consistent with the results of this study, where work duration was the main influencing factor for shoulder musculoskeletal disorders among healthcare professionals (12). We recommend providing targeted ergonomics-oriented occupational health education for medical staff, replacing ergonomic chairs, encouraging correct working postures, and emphasizing the importance of rest after work to reduce the incidence of occupational musculoskeletal disorders. Personalized musculoskeletal disorder risk management advice should be provided to healthcare professionals across different departments, considering both occupational factors and individual health profiles. In addition to professional factors, this study also discovered a correlation between the number of chronic diseases in medical personnel and the risk of shoulder musculoskeletal disorders, suggesting that future research should delve deeper into the clinical mechanisms linking work-related musculoskeletal disorders with chronic diseases.

This study also has certain limitations. The absence of physiological tests makes it difficult to eliminate factors causing musculoskeletal disorders unrelated to work. Another limitation is the lack of comparison of musculoskeletal disorder factors among medical staff from different departments. The risk prediction models are derived from cross-sectional data, where exposure and outcome are ascertained at the same time point, inherently limiting the predictions. Additionally, the sample size of this study is relatively small. Future studies should establish large cohorts of healthcare workers with WMSDs to better explore the causal relationships between variables. Furthermore, a comparative analysis of musculoskeletal disorder factors among medical staff from different departments should be conducted.

5 Conclusion

Five machine learning models were utilized to construct predictive models for the risk of neck and shoulder musculoskeletal disorders among healthcare professionals. These models are economically feasible and convenient for preliminary screening of work-related musculoskeletal disorders in healthcare workers. Additionally, this study offers a comprehensive interpretable machine learning framework, enabling a quantitative analysis of the impact of occupational factors on the risk of work-related musculoskeletal disorders. A web calculator can be applied to the early detection and prevention of neck and shoulder WMSDs in healthcare workers.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the Ethics Committee of Nanning Center for Disease Control and Prevention. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

NL: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Project administration, Validation, Visualization, Writing – original draft. XX: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing – original draft. BJ: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Validation, Writing – original draft. ZZ: Data curation, Formal analysis, Investigation, Methodology, Writing – original draft. JH: Formal analysis, Investigation, Writing – original draft. XZ: Formal analysis, Investigation, Writing – original draft. QT: Formal analysis, Investigation, Writing – original draft. XW: Formal analysis, Investigation, Writing – original draft. SB: Formal analysis, Investigation, Writing – original draft. SL: Formal analysis, Investigation, Writing – original draft. YP: Formal analysis, Investigation, Writing – original draft. CT: Writing – original draft, Writing – review & editing. PZ: Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. The results reported herein correspond to specific aims of grant Z20211337 to investigator NL from Self-funded project of the Health Commission of Guangxi Zhuang Autonomous Region.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Barbe, MF, and Barr, AE. Inflammation and the pathophysiology of work-related musculoskeletal disorders. Brain Behav Immun. (2006) 20:423–9. doi: 10.1016/j.bbi.2006.03.001

PubMed Abstract | Crossref Full Text | Google Scholar

2. Jia, N, Zhang, H, Ling, R, Liu, Y, Li, G, Ren, Z, et al. Epidemiological data of work-related musculoskeletal disorders—China, 2018–2020. China CDC Wkly. (2021) 3:383–9. doi: 10.46234/ccdcw2021.104

PubMed Abstract | Crossref Full Text | Google Scholar

3. Taib, MFM, Bahn, S, Yun, MH, and Taib, MSM. The effects of physical and psychosocial factors and ergonomic conditions on the prevalence of musculoskeletal disorders among dentists in Malaysia. Work. (2017) 57:297–308. doi: 10.3233/WOR-172559

PubMed Abstract | Crossref Full Text | Google Scholar

4. Rezaei, B, Mousavi, E, Heshmati, B, and Asadi, S. Low back pain and its related risk factors in health care providers at hospitals: a systematic review. Ann Med Surg. (2021) 70:102903. doi: 10.1016/j.amsu.2021.102903

PubMed Abstract | Crossref Full Text | Google Scholar

5. Wang, J, Cui, Y, He, L, Xu, X, Yuan, Z, Jin, X, et al. Work-related musculoskeletal disorders and risk factors among Chinese medical staff of obstetrics and gynecology. Int J Environ Res Public Health. (2017) 14:562. doi: 10.3390/ijerph14060562

PubMed Abstract | Crossref Full Text | Google Scholar

6. Krishnan, KS, Raju, G, and Shawkataly, O. Prevalence of work-related musculoskeletal disorders: psychological and physical risk factors. Int J Environ Res Public Health. (2021) 18:9361. doi: 10.3390/ijerph18179361

PubMed Abstract | Crossref Full Text | Google Scholar

7. Yang, L, Hildebrandt, VH, Yu, S, and Lin, R. Introduction to musculoskeletal disorders questionnaire with attached questionnaire. Indust Hyg Occupat Dis. (2009) 35:25–31.

Google Scholar

8. Macfarlane, B . Collegiality and performativity in a competitive academic culture. High Educ Rev. (2016) 48:31–50.

Google Scholar

9. Wong, T-T, and Yeh, P-Y. Reliable accuracy estimates from k-fold cross validation. IEEE Trans Knowl Data Eng. (2019) 32:1586–94. doi: 10.1109/TKDE.2019.2912815

Crossref Full Text | Google Scholar

10. Greenwell, B, and Greenwell, MB. Package ‘fastshap.’ (2020).

Google Scholar

11. Huang, J, Chen, H, Deng, J, Liu, X, Shu, T, Yin, C, et al. Interpretable machine learning for predicting 28-day all-cause in-hospital mortality for hypertensive ischemic or hemorrhagic stroke patients in the ICU: a multi-center retrospective cohort study with internal and external cross-validation. Front Neurol. (2023) 14:14. doi: 10.3389/fneur.2023.1185447

Crossref Full Text | Google Scholar

12. Zhang, D, Yan, M, Lin, H, Xu, G, Yan, H, and He, Z. Evaluation of work-related musculoskeletal disorders among sonographers in general hospitals in Guangdong province, China. Int J Occup Saf Ergon. (2020) 26:802–10. doi: 10.1080/10803548.2019.1672411

PubMed Abstract | Crossref Full Text | Google Scholar

13. Dong, H, Zhang, Q, Liu, G, Shao, T, and Xu, Y. Prevalence and associated factors of musculoskeletal disorders among Chinese healthcare professionals working in tertiary hospitals: a cross-sectional study. BMC Musculoskelet Disord. (2019) 20:1–7. doi: 10.1186/s12891-019-2557-5

Crossref Full Text | Google Scholar

14. Devika, R, Avilala, SV, and Subramaniyaswamy, V. Comparative study of classifier for chronic kidney disease prediction using naive bayes, KNN and random forest. In: 2019 3rd international conference on computing methodologies and communication (ICCMC). IEEE (2019). p. 679–684. doi: 10.1109/ICCMC.2019.8819654

Crossref Full Text | Google Scholar

15. Lin, SC, Lin, LL, Liu, CJ, Fang, CK, and Lin, MH. Exploring the factors affecting musculoskeletal disorders risk among hospital nurses. PLoS One. (2020) 15:e0231319. doi: 10.1371/journal.pone.0231319

PubMed Abstract | Crossref Full Text | Google Scholar

16. Yang, S, Lu, J, Zeng, J, Wang, L, and Li, Y. Prevalence and risk factors of work-related musculoskeletal disorders among intensive care unit nurses in China. Workplace Health Saf. (2019) 67:275–87. doi: 10.1177/2165079918809107

PubMed Abstract | Crossref Full Text | Google Scholar

17. Lietz, J, Kozak, A, and Nienhaus, A. Prevalence and occupational risk factors of musculoskeletal diseases and pain among dental professionals in Western countries: a systematic literature review and meta-analysis. PLoS One. (2018) 13:e0208628. doi: 10.1371/journal.pone.0208628

PubMed Abstract | Crossref Full Text | Google Scholar

18. Abdollahi, T, Pedram Razi, S, Pahlevan, D, Yekaninejad, MS, Amaniyan, S, Leibold Sieloff, C, et al. Effect of an ergonomics educational program on musculoskeletal disorders in nursing staff working in the operating room: a quasi-randomized controlled clinical trial. Int J Environ Res Public Health. (2020) 17:7333. doi: 10.3390/ijerph17197333

PubMed Abstract | Crossref Full Text | Google Scholar

19. Ou, Y-K, Liu, Y, Chang, Y-P, and Lee, B-O. Relationship between musculoskeletal disorders and work performance of nursing staff: a comparison of hospital nursing departments. Int J Environ Res Public Health. (2021) 18:7085. doi: 10.3390/ijerph18137085

PubMed Abstract | Crossref Full Text | Google Scholar

20. Garosi, VH, Zarei, MR, Farahan, MA, Ziapour, A, and Haghani, H. A study of nurses’ performance relative to the risk factors for musculoskeletal disorders associated with patient mobility in the teaching hospitals across Kermanshah. J Public Health (Bangkok). (2021) 29:823–8. doi: 10.1007/s10389-019-01138-5

Crossref Full Text | Google Scholar

21. Dong, Y, Nazakat, M, Wang, F, Jin, X, and Wang, S. Establishment and verification of the Chinese musculoskeletal questionnaire ———the questionnaire is attached in the attachment. China Occupat Med. (2020) 47:8–18.

Google Scholar

Keywords: musculoskeletal disorders, machine learning, health care professionals, ergonomics, shiny app

Citation: Luo N, Xu X, Jiang B, Zhang Z, Huang J, Zhang X, Tan Q, Wang X, Bai S, Liu S, Pan Y, Tang C and Zhu P (2024) Explainable machine learning framework to predict the risk of work-related neck and shoulder musculoskeletal disorders among healthcare professionals. Front. Public Health. 12:1414209. doi: 10.3389/fpubh.2024.1414209

Received: 08 April 2024; Accepted: 09 August 2024;
Published: 20 August 2024.

Edited by:

Elpidoforos Soteriades, Open University of Cyprus, Cyprus

Reviewed by:

Orhan Korhan, Eastern Mediterranean University, Cyprus
Omar Farooq, Aligarh Muslim University, India

Copyright © 2024 Luo, Xu, Jiang, Zhang, Huang, Zhang, Tan, Wang, Bai, Liu, Pan, Tang and Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Pinghua Zhu, emh1cGluZ2h1YUBneG11LmVkdS5jbg==

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.