AUTHOR=Fu Chao , Cui Yiyang , Li Jing , Yu Jing , Wang Yan , Si Caifeng , Cui Kefei TITLE=Effect of the categorization method on the diagnostic performance of ultrasound risk stratification systems for thyroid nodules JOURNAL=Frontiers in Oncology VOLUME=13 YEAR=2023 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2023.1073891 DOI=10.3389/fonc.2023.1073891 ISSN=2234-943X ABSTRACT=Objective

To evaluate whether the categorization methods of risk stratification systems (RSSs) is a decisive factor that influenced the diagnostic performances and unnecessary FNA rates in order to choose optimal RSS for the management of thyroid nodules.

Methods

From July 2013 to January 2019, 2667 patients with 3944 thyroid nodules had undergone pathological diagnosis after thyroidectomy and/or US-guided FNA. US categories were assigned according to the six RSSs. The diagnostic performances and unnecessary FNA rates were calculated and compared according to the US-based final assessment categories and the unified size thresholds for biopsy proposed by ACR-TIRADS, respectively.

Results

A total of 1781 (45.2%) thyroid nodules were diagnosed as malignant after thyroidectomy or biopsy. Significantly lowest specificity and accuracy, along with the highest unnecessary FNA rates were seen in EU-TIRADS for both US categories (47.9%, 70.2%, and 39.4%, respectively, all P < 0.05) and indications for FNA (54.2%, 50.0%, and 55.4%, respectively, all P < 0.05). Diagnostic performances for US-based final assessment categories exhibited similar accuracy for AI-TIRADS, Kwak-TIRADS, C-TIRADS, and ATA guidelines (78.0%, 77.8%, 77.9%, and 76.3%, respectively, all P > 0.05), while the lowest unnecessary FNA rate was seen in C-TIRADS (30.9%) and without significant differences to that of AI-TIRADS, Kwak-TIRADS, and ATA guideline (31.5%, 31.7%, and 33.6%, respectively, all P > 0.05). Diagnostic performance for US-FNA indications showed similar accuracy for ACR-TIRADS, Kwak-TIRADS, C-TIRADS and ATA guidelines (58.0%, 59.7%, 58.7%, and 57.1%, respectively, all P > 0.05). The highest accuracy and lowest unnecessary FNA rate were seen in AI-TIRADS (61.9%, 38.6%) and without significant differences to that of Kwak-TIRADS(59.7%, 42.9%) and C-TIRADS 58.7%, 43.9%, all P > 0.05).

Conclusion

The different US categorization methods used by each RSS were not determinant influential factors in diagnostic performance and unnecessary FNA rate. For daily clinical practice, the score-based counting RSS was an optimal choice.