- 1Department of Anaesthesiology, Central People's Hospital of Zhanjiang, Zhanjiang, China
- 2Department of Anesthesiology, Pain and Perioperative Medicine, First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- 3Anesthesia and Big Data Research Group, Central People's Hospital of Zhanjiang, Zhanjiang, China
Background: In this paper, we examine whether machine learning and deep learning can be used to predict difficult airway intubation in patients undergoing thyroid surgery.
Methods: We used 10 machine learning and deep learning algorithms to establish a corresponding model through a training group, and then verify the results in a test group. We used R for the statistical analysis and constructed the machine learning prediction model in Python.
Results: The top 5 weighting factors for difficult airways identified by the average algorithm in machine learning were age, sex, weight, height, and BMI. In the training group, the AUC values and accuracy and the Gradient Boosting precision were 0.932, 0.929, and 100%, respectively. As for the modeled effects of predicting difficult airways in test groups, among the models constructed by the 10 algorithms, the three algorithms with the highest AUC values were Gradient Boosting, CNN, and LGBM, with values of 0.848, 0.836, and 0.812, respectively; In addition, among the algorithms, Gradient Boosting had the highest accuracy with a value of 0.913; Additionally, among the algorithms, the Gradient Boosting algorithm had the highest precision with a value of 100%.
Conclusion: According to our results, Gradient Boosting performed best overall, with an AUC >0.8, an accuracy >90%, and a precision of 100%. Besides, the top 5 weighting factors identified by the average algorithm in machine learning for difficult airways were age, sex, weight, height, and BMI.
Introduction
Thyroid surgery is a common procedure in head and neck surgery. It often entails general anesthesia. The incidence of difficult airways during tracheal intubation (DTI) is around 10% (1). Due to the thyroid's close anatomical relationship with the larynx, laryngopharynx, and trachea, the airway may be obstructed during surgery in the presence of a large or invasive mass. This elevates the risk of anesthesia-related death and morbidity. By assessing patients' airway anatomy and pathologic changes prior to surgery, anesthesiologists can ensure safe airway management for these patients.
Presently, many intubation difficulties in thyroid surgery patients cannot be predicted in advance, due to limited predictive tools (2, 3). Machine learning has been applied to several medical fields, including cancer, pulmonary complications, chronic pain, and mental health (4–7). A study of patients with obesity has demonstrated that machine learning can help predict difficult intubations: Among the six machine learning algorithms, only three can predict intubation difficulty in patients with obesity, and the Xgbc algorithm has the best comprehensive performance, with an accuracy rate exceeding 80% (8). At present, no corresponding model has been specifically established for difficult airways in patients with thyroid problems. Among the existing models, the predictive performance is insufficient to meet clinical needs. Therefore, in this paper, we explore whether machine learning and deep learning could be used to predict difficult airways in patients undergoing thyroid surgery. In this study, we used a variety of artificial intelligence algorithms and divided the dataset into a training group and a test group. The 500 patients were randomly split into training (N = 350) and test (N = 150) cohorts. After we trained the prediction model, we verified it in the test group.
Methods
Ethics
The study program was approved by the Clinical Research Ethics Committee at the First Affiliated Hospital of Zhengzhou University (2021-KY-673). As the retrospective analysis was based on BioStudies' publicly accessible data, the ethics committee exempted informed consent.
Dataset
A total of 500 patients who had undergone thyroid surgery were enrolled in this study, and difficult airway intubation occurred in 48 of them. The basic information about the patient is shown in Table 1.
Study participants were excluded if they had one or more anatomical abnormalities, pathology, non-standard approach, or optical fiber sober intubation, as suggested by previous procedures. DTI was defined as operations performed with correct head position and external laryngeal operation, resulting in the following: (a) difficult laryngoscopy; (b) multiple intubation attempts; (c) ineffective standard equipment and/or procedures; and (d) withdrawal and procedure reprogramming (9).
Machine learning and deep learning methods
In this study, we used 10 algorithms, both machine learning and deep learning, to establish a corresponding model through the training group, and then verified the results in the test group.
We constructed the machine learning and deep learning models primarily in the Python language. We trained the machine learning and deep learning models, including Logistic Regression, Random Forest, Gradient Boosting, extreme gradient boosting-XGB, light gradient boosting machine-LGBM, Multilayer Perceptron Classifier-MLPC, Gaussian naïve Bayes-gnb, Convolutional Neural Network-CNN, Long Short-Term Memory- LSTM, and CNNLSTM after selecting variables for DTI prediction in the training set. Firstly, the independent variables were standardized in terms of feature ranges. We standardized our data using the sklearn library's StandardScaler software package. During the training process, we used 5-fold cross-validation to prevent model overfitting. In short, we divided the training data into 5 hierarchical subsets. Then, we trained the models using 4 subsets and validated them using the remaining subsets. In addition, we manually trained the parameters in each model. To assess the features' significance for model development, we used XGB, LGBM, and GBDT. The area under the ROC curve (AUC), accuracy, recall, precision, and F1 score served as metrics to evaluate the models. An effective model needs to produce ideal values, in both the training and the test groups. The closer the ROC curve is to the upper-left corner, the more representative the model is, that is, the AUC is close to 1. The relevant parameters of the 10 models in the training group are shown in Supplementary Table 5.
We compared the general patient data of the training group and the test group using R software. Normally distributed measurement data were expressed as x ± s, with an independent samples t-test used for comparison between groups; non-normally distributed measurement data were expressed as median and quartile range, with a Mann-Whitney U test used for comparison between groups. The count data were expressed as cases or percentages, with the χ2 test or Fisher's exact probability test used for comparison between groups. The inspection level α = 0.05, and we considered any difference statistically significant if p < 0.05.
Results
The correlation between each variable and difficult airway could be determined via Heat Map (Figure 1). In addition, the top 5 weighting factors for difficult airways identified by the average algorithm in machine learning were age, sex, weight, height, and BMI (Figure 2). The most important influencing factor in the single GBDT and LGBM algorithms was neck circumference, while the most important influencing factor in the single XGB algorithm was sex (Supplementary Figures 1–3).
Figure 1. Correlation between individual variables and DIT. GOITER.CIRC-neck circumference (cm); PAT-Malignancy at HP; AP.MOUTH-Mouth opening <4 cm; MALLAMP-Mallampati score ≥III; NECK.MOV-Neck movement ≤90°; PROGNAT-Inability to prognath; PAST.DI-Past difficult intubation; GOITER.MED-Mediastinal goiter; TRACH.DEV.RX-Tracheal deviation at CXR; TMD-TMD ≤6.5; NC.TMD-NC/TMD ≥5; EL.GANZURI-el-Ganzouri score ≥4.
The results of the model predictions for the difficult airway in the training group are shown in Supplementary Table 2 and Supplementary Figure 4. In the training group, the AUC values and accuracy and Gradient Boosting precision were 0.932, 0.929, and 100%, respectively.
Model effects for predicting difficult airways in the test groups: Among the models constructed by the 10 algorithms, the three algorithms with the highest AUC values were Gradient Boosting, CNN, and LGBM; their values were 0.848, 0.836, and 0.812, respectively. In addition, compared with other algorithms, the Gradient Boosting algorithm has the highest accuracy with a value of 0.913. In addition, compared with other algorithms, the Gradient Boosting algorithm has the highest precision with a value of 100% (Table 2 and Figure 3).
Figure 3. The artificial intelligence algorithm predicts the AUC value of DIT in the test group. Logistic Regression, Random Forest, Gradient Boosting, extreme gradient boosting-XGB, light gradient boosting machine-LGBM, Multilayer Perceptron Classifier-MLPC, Gaussian naive Bayes-gnb, Convolutional Neural Network-CNN, Long Short-Term Memory- LSTM and CNNLSTM.
The basic information results of the training group and the test group data sets are shown in Supplementary Table 3.
Discussion
Airway management is a major concern for anesthesiologists when the thyroid gland is intubated due to goiter caused by airway distortion. Difficult or failed intubation may lead to serious complications, such as hypoxic brain injury, and even death. In recent years, machine learning prediction tools, which often outperform traditional prediction methods, have gained widespread popularity in medicine. According to our results, Gradient Boosting can deliver satisfactory results in the training and test groups, in terms of overall performance. In addition, the top 5 weighting factors identified by the average algorithm in machine learning for difficult airways are age, sex, weight, height, and BMI. The most important influencing factor in single GBDT and LGBM algorithms is neck circumference, while the most important influencing factor in the single XGB algorithm is sex.
Age has been linked to DIT in numerous studies. For example, statistics indicate that difficult or failed intubations are most common among people aged 40–59 (10). Moreover, older or middle-aged adults may struggle more with tracheal intubation than younger adults, and the predictors vary across age groups (10). Studies have shown that age- and height-based formulas can identify difficult airways early on in pediatric patients (11). However, other studies have also shown no statistically significant correlation between age and difficult intubation (12). Our findings suggest a strong correlation between age and DIT.
Many studies have shown a strong correlation between sex and DIT; the incidence of difficult tracheal intubation is higher in men than in women (p < 0.001) (13). Risk factors for difficult tracheal intubation include being male (14). Most likely, this is because males are the ones who increase the distance of all morphometric measurements (15). Our results also support this contention.
Moreover, height, weight, and BMI each has strong correlations with DIT. It has been shown that the ratio of height to thymic distance, and the ratio of height to sternum distance, can serve as predictors of airway difficulties (16). Moreover, the former can be used to predict tracheal intubation difficulties (17). Additionally, BMI may be a predictor of tracheal intubation difficulties in patients with obstructive sleep apnea syndrome (18). Our study also supports this contention.
There is also a strong correlation between neck circumference and DIT. In obstetric patients, a neck circumference of ≥33.5 cm is a sensitive predictor of difficult intubation (19). Thus, a neck circumference examination may help detect adverse perioperative respiratory events in children (20). In predicting intubation difficulties in the Indian population, the NC/TM ratio and Mallampatti score had better diagnostic accuracy than other bedside tests (21). Our study also supports this contention.
Many studies have also shown that machine learning and deep learning play an important role in related research in the medical field. For example, studies have shown that convolutional neural networks can be used to detect and classify COVID-19 from x-ray images (9). Studies have also shown that the path of the COVID-19 epidemic can be predicted by current evidence using machine learning algorithms (22). Studies have also shown that the use of new machine learning methods can detect Corona Virus Disease (23). Other studies have explored difficult airways and artificial intelligence. For example, studies have shown that modern machine learning methods can be used to predict difficult airways in the E.R. (24); Studies have also shown that difficult airways can be distinguished from frontal images using depth learning model sets (25). Likewise, it has been shown that the CNN algorithm can classify difficult airways (26). Our study is the first to use a large number of machine learning algorithms and deep learning algorithms simultaneously to predict difficult airways in patients with thyroid problems. We concluded that Gradient Boosting is the algorithm with the optimal comprehensive performance.
There are several limitations to this study. First, due to the retrospective nature of the data, we were unable to include new variables, such as patients' facial images, and some of their genetic transcriptomic information. This may have contributed to the model's performance. Second, we only performed internal validation of this model, and a multicenter prospective cohort validation is needed in the future. Third, for imbalanced data classification, we used f1-score and ROC-AUC curves together with the accuracy rate to evaluate the model. Fourth, feature extraction and screening would also have been of great help to the study (27), as our research entailed database data analysis, and we were unable to extract new features for research. Finally, in this study, we randomly divided the dataset into a training group and a test group. The functions of these two sample sets are as follows: the training set is used to train the supervised model, the fit model, adjust parameters, and make other choices to the algorithm; The test set is used to evaluate the effect of the trained model, but it does not change the parameters and effects of the model. It is generally used to verify whether the model is over-fitted or under-fitted, and to decide whether to retrain the model or choose another algorithm. However, more multicenter validation studies are needed in the future.
Conclusion
Among the algorithms, Gradient Boosting performed best overall, with an AUC >0.8, an accuracy >90%, and a precision of 100%. Therefore, Gradient Boosting may be one of the preferred algorithms for future research on airway prediction among patients with thyroid difficulty.
There will be some risks and challenges in future research on difficult airways and artificial intelligence prediction models. First, the algorithm's stability and different medical scenarios may destabilize the model prediction. This would require additional subgroup analysis. Second, for the model to be applicable, we need to use simple and high-quality data to build the model. This would require establishing a high-quality database to store data.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author/s.
Ethics statement
The studies involving human participants were reviewed and approved by First Affiliated Hospital of Zhengzhou University. The Ethics Committee waived the requirement of written informed consent for participation.
Author contributions
C-MZ, YW, QX, J-JY, and YZ contributed to the data analysis, drafting, and revision of the article. All authors gave final approval of the version to be published and agreed to be accountable for all aspects of the work.
Acknowledgments
We are grateful to the BioStudies database for providing the original data (28).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2022.937471/full#supplementary-material
Supplementary Figure 1. Analysis of weighting each variable to DIT using the GBDT algorithm.
Supplementary Figure 2. Analysis of weighting each variable to DIT using the LGBM algorithm.
Supplementary Figure 3. Analysis of weighting each variable to DIT using the XGB algorithm.
Supplementary Figure 4. The artificial intelligence algorithm predicts the AUC value of DIT in the training group. Logistic Regression, Random Forest, Gradient Boosting, extreme gradient boosting-XGB, light gradient boosting machine-LGBM, Multilayer Perceptron Classifier-MLPC, Gaussian naive Bayes-gnb, Convolutional Neural Network-CNN, Long Short-Term Memory- LSTM and CNNLSTM.
Supplementary Table 1. Tuning parameters in the anaconda software used for each algorithm.
Supplementary Table 2. Artificial intelligence algorithm predicts DIT results in training groups.
Supplementary Table 3. The basic information results of the training group and the test group data sets.
References
1. Bouaggad A, Nejmi SE, Bouderka MA, Abbassi O. Prediction of difficult tracheal intubation in thyroid surgery. Anesth Analg. (2004) 99:603–6. doi: 10.1213/01.ANE.0000122634.69923.67
2. Amathieu R, Smail N, Catineau J, Poloujadoff MP, Samii K, Adnet F. Difficult intubation in thyroid surgery: myth or reality? Anesth Analg. (2006) 103:965–8. doi: 10.1213/01.ane.0000237305.02465.ee
3. Olusomi BB, Aliyu SZ, Babajide AM, Sulaiman AO, Adegboyega OS, Gbenga HO, et al. Goitre-related factors for predicting difficult intubation in patients scheduled for thyroidectomy in a resource-challenged health institution in North Central Nigeria. Ethiop J Health Sci. (2018) 28:169–76. doi: 10.4314/ejhs.v28i2.8
4. Zhou C, Hu J, Wang Y, Ji MH, Tong J, Yang JJ, et al. A machine learning-based predictor for the identification of the recurrence of patients with gastric cancer after operation. Sci Rep. (2021) 11:1571. doi: 10.1038/s41598-021-81188-6
5. Xue Q, Wen D, Ji MH, Tong J, Yang JJ, Zhou CM. Developing machine learning algorithms to predict pulmonary complications after emergency gastrointestinal surgery. Front Med. (2021) 8:655686. doi: 10.3389/fmed.2021.655686
6. Wang Y, Zhu Y, Xue Q, Ji M, Tong J, Yang JJ, et al. Predicting chronic pain in postoperative breast cancer patients with multiple machine learning and deep learning models. J Clin Anesth. (2021) 74:110423. doi: 10.1016/j.jclinane.2021.110423
7. Lei L, Wang Y, Xue Q, Tong J, Zhou CM, Yang JJ. A comparative study of machine learning algorithms for predicting acute kidney injury after liver cancer resection. PeerJ. (2020) 8:e8583. doi: 10.7717/peerj.8583
8. Zhou CM, Xue Q, Ye HT, Wang Y, Tong J, Ji MH, et al. Constructing a prediction model for difficult intubation of obese patients based on machine learning. J Clin Anesth. (2021) 72:110278. doi: 10.1016/j.jclinane.2021.110278
9. Ayalew AM, Salau AO, Abeje BT, Enyew B. Detection and classification of COVID-19 disease from X-ray images using convolutional neural networks and histogram of oriented gradients. Biomed Signal Process Control. (2022) 74:103530. doi: 10.1016/j.bspc.2022.103530
10. Moon HY, Baek CW, Kim JS, Koo GH, Kim JY, Woo YC, et al. The causes of difficult tracheal intubation and preoperative assessments in different age groups. Korean J Anesthesiol. (2013) 64:308–14. doi: 10.4097/kjae.2013.64.4.308
11. Mathew P, Ashok V, Siraj MM, Grover V, Sethuraman D. Validation of age and height based formulae to predict paediatric airway distances - a prospective observational study. J Postgrad Med. (2019) 65:164–8. doi: 10.4103/jpgm.JPGM_545_18
12. Basaranoglu G, Columb M, Lyons G. Failure to predict difficult tracheal intubation for emergency caesarean section. Eur J Anaesthesiol. (2010) 27:947–9. doi: 10.1097/EJA.0b013e32833e2656
13. Wang B, Zheng C, Yao W, Guo L, Peng H, Yang F, et al. Predictors of difficult airway in a Chinese surgical population: the gender effect. Minerva Anestesiol. (2019) 85:478–86. doi: 10.23736/S0375-9393.18.12605-8
14. Rose DK, Cohen MM. The airway: problems and predictions in 18,500 patients. Can J Anaesth. (1994) 41:372–83. doi: 10.1007/BF03009858
15. Türkan S, Ateş Y, Cuhruk H, Tekdemir I. Should we reevaluate the variables for predicting the difficult airway in anesthesiology? Anesth Analg. (2002) 94:1340–4. doi: 10.1097/00000539-200205000-00055
16. Ray S, Rao S, Kaur J, Gaude YK. Ratio of height-to-thyromental distance and ratio of height-to-sternomental distance as predictors of laryngoscopic grade in children. J Anaesthesiol Clin Pharmacol. (2018) 34:68–72. doi: 10.4103/joacp.JOACP_135_16
17. Badheka JP, Doshi PM, Vyas AM, Kacha NJ, Parmar VS. Comparison of upper lip bite test and ratio of height to thyromental distance with other airway assessment tests for predicting difficult endotracheal intubation. Indian J Crit Care Med. (2016) 20:3–8. doi: 10.4103/0972-5229.173678
18. Kurtipek O, Isik B, Arslan M, Unal Y, Kizil Y, Kemaloglu Y. A study to investigate the relationship between difficult intubation and prediction criterion of difficult intubation in patients with obstructive sleep apnea syndrome. J Res Med Sci. (2012) 17:615–20. Available online at: http://jrms.mui.ac.ir/index.php/jrms/article/view/8503
19. Riad W, Ansari T, Shetty N. Does neck circumference help to predict difficult intubation in obstetric patients? A prospective observational study. Saudi J Anaesth. (2018) 12:77–81. doi: 10.4103/sja.SJA_385_17
20. Nafiu OO, Burke CC, Gupta R, Christensen R, Reynolds PI, Malviya S. Association of neck circumference with perioperative adverse respiratory events in children. Pediatrics. (2011) 127:e1198–205. doi: 10.1542/peds.2010-2471
21. Dhanger S, Gupta SL, Vinayagam S, Bidkar PU, Elakkumanan LB, Badhe AS. Diagnostic accuracy of bedside tests for predicting difficult intubation in Indian population: an observational study. Anesth Essays Res. (2016) 10:54–8. doi: 10.4103/0259-1162.165503
22. Indumathi N, Shanmuga EM, Salau AO, Ramalakshmi R, Revathy R. Prediction of COVID-19 Outbreak With Current Substantiation Using Machine Learning Algorithms. Intelligent Interactive Multimedia Systems for e-Healthcare Applications. Singapore: Springer (2022).
23. Salau AO. Detection of corona virus disease using a novel machine learning approach. In: 2021 International Conference on Decision Aid Sciences and Application (DASA). Sakheer (2021). pp. 587–90. doi: 10.1109/DASA53625.2021.9682267
24. Yamanaka S, Goto T, Morikawa K, Watase H, Okamoto H, Hagiwara Y, et al. Machine learning approaches for predicting difficult airway and first-pass success in the emergency department: multicenter prospective observational study. Interact J Med Res. (2022) 11:e28366. doi: 10.2196/28366
25. Tavolara TE, Gurcan MN, Segal S, Niazi MKK. Identification of difficult to intubate patients from frontal face images using an ensemble of deep learning models. Comput Biol Med. (2021) 136:104737. doi: 10.1016/j.compbiomed.2021.104737
26. Hayasaka T, Kawano K, Kurihara K, Suzuki H, Nakane M, Kawamae K. Creation of an artificial intelligence model for intubation difficulty classification by deep learning (convolutional neural network) using face images: an observational study. J Intensive Care. (2021) 9:38. doi: 10.1186/s40560-021-00551-x
27. Salau AO, Jain S. Feature extraction: a survey of the types, techniques, applications. In: 2019 International Conference on Signal Processing and Communication (ICSC). NOIDA (2019).
Keywords: difficult airways, machine learning, deep learning, CNN, intubation
Citation: Zhou C-M, Wang Y, Xue Q, Yang J-J and Zhu Y (2022) Predicting difficult airway intubation in thyroid surgery using multiple machine learning and deep learning algorithms. Front. Public Health 10:937471. doi: 10.3389/fpubh.2022.937471
Received: 06 May 2022; Accepted: 12 July 2022;
Published: 10 August 2022.
Edited by:
Chuan-Yu Chang, National Yunlin University of Science and Technology, TaiwanReviewed by:
Ayodeji Olalekan Salau, Afe Babalola University, NigeriaErnest Namdar, University of Toronto, Canada
Copyright © 2022 Zhou, Wang, Xue, Yang and Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Cheng-Mao Zhou, emhvdWNoZW5nbWFvMTg3JiN4MDAwNDA7Zm94bWFpbC5jb20=; Jian-Jun Yang, eWp5YW5namomI3gwMDA0MDsxMjYuY29t; Yu Zhu, emh1eXUmI3gwMDA0MDt6cW1jLmVkdS5jbg==