AUTHOR=Madeo Bruno , Brigante Giulia , Ansaloni Anna , Taliani Erica , Kaleci Shaniko , Monzani Maria Laura , Simoni Manuela , Rochira Vincenzo TITLE=The Added Value of Operator's Judgement in Thyroid Nodule Ultrasound Classification Arising From Histologically Based Comparison of Different Risk Stratification Systems JOURNAL=Frontiers in Endocrinology VOLUME=11 YEAR=2020 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2020.00434 DOI=10.3389/fendo.2020.00434 ISSN=1664-2392 ABSTRACT=

Objective: Several ultrasound classifications for thyroid nodules were proposed but their accuracy is still debated, since mainly estimated on cytology and not on histology. The aim of this study was to test the diagnostic accuracy and the inter-classification agreement of AACE/ACE-AME, American Thyroid Association (ATA), British Thyroid Association (BTA), and Modena Ultrasound Thyroid Classification (MUT) that stratifies malignancy risk considering also the clinician subjective impression.

Methods: A prospective study collecting thyroid nodule features at ultrasound and histological diagnosis was conducted. Ultrasound features were collected following a preformed checklist in candidates for surgery because of indeterminate, suspicious, or malignant cytology. All the nodules, besides the cytologically suspicious one, were blinded analyzed. MUT score was applied prospectively, and the others retrospectively. Sensitivity, specificity, diagnostic cut-off value, and accuracy of each classification were calculated. The overall agreement between classifications was tested by Bland-Altman, and agreement between single nodule analysis by different classifications by Weighted Cohen's Kappa.

Results: In classifying a total of 457 nodules, MUT has the highest accuracy (AUC 0.808) and specificity (89%), followed by ATA and BTA, and finally by AACE/ACE-AME. ATA, BTA, and MUT are highly interchangeable. Considering agreement between single nodule analyses, ATA and BTA had the best (κ = 0.723); AACE/ACE-AME showed slight agreement with BTA (κ = 0.177) and MUT (κ = 0.183), and fair agreement with ATA (κ = 0.282); MUT had fair agreement with both ATA (κ = 0.291) and BTA (κ = 0.271).

Conclusion: Classifications have an acceptable overall diagnostic accuracy, improved using a less rigid system that takes into consideration operator subjective impression.