AUTHOR=Gu Jianhua , Xie Rongli , Zhao Yanna , Zhao Zhifeng , Xu Dan , Ding Min , Lin Tingyu , Xu Wenjuan , Nie Zihuai , Miao Enjun , Tan Dan , Zhu Sibo , Shen Dongjie , Fei Jian TITLE=A machine learning-based approach to predicting the malignant and metastasis of thyroid cancer JOURNAL=Frontiers in Oncology VOLUME=12 YEAR=2022 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.938292 DOI=10.3389/fonc.2022.938292 ISSN=2234-943X ABSTRACT=Background

Thyroid Cancer (TC) is the most common malignant disease of endocrine system, and its incidence rate is increasing year by year. Early diagnosis, management of malignant nodules and scientific treatment are crucial for TC prognosis. The first aim is the construction of a classification model for TC based on risk factors. The second aim is the construction of a prediction model for metastasis based on risk factors.

Methods

We retrospectively collected approximately 70 preoperative demographic and laboratory test indices from 1735 TC patients. Machine learning pipelines including linear regression model ridge, Logistic Regression (LR) and eXtreme Gradient Boosting (XGBoost) were used to select the best model for predicting deterioration and metastasis of TC. A comprehensive comparative analysis with the prediction model using only thyroid imaging reporting and data system (TI-RADS).

Results

The XGBoost model achieved the best performance in the final thyroid nodule diagnosis (AUC: 0.84) and metastasis (AUC: 0.72-0.77) predictions. Its AUCs for predicting Grade 4 TC deterioration and metastasis reached 0.84 and 0.97, respectively, while none of the AUCs for Only TI-RADS reached 0.70. Based on multivariate analysis and feature selection, age, obesity, prothrombin time, fibrinogen, and HBeAb were common significant risk factors for tumor progression and metastasis. Monocyte, D-dimer, T3, FT3, and albumin were common protective factors. Tumor size (11.14 ± 7.14 mm) is the most important indicator of metastasis formation. In addition, GGT, glucose, platelet volume distribution width, and neutrophil percentage also contributed to the development of metastases. The abnormal levels of blood lipid and uric acid were closely related to the deterioration of tumor. The dual role of mean erythrocytic hemoglobin concentration in TC needs to be verified in a larger patient cohort. We have established a free online tool (http://www.cancer-thyroid.com/) that is available to all clinicians for the prognosis of patients at high risk of TC.

Conclusion

It is feasible to use XGBoost algorithm, combined with preoperative laboratory test indexes and demographic characteristics to predict tumor progression and metastasis in patients with TC, and its performance is better than that of Only using TI-RADS. The web tools we developed can help physicians with less clinical experience to choose the appropriate clinical decision or secondary confirmation of diagnosis results.