ORIGINAL RESEARCH article
Front. Radiol.
Sec. Artificial Intelligence in Radiology
Volume 5 - 2025 | doi: 10.3389/fradi.2025.1509377
Diagnostic precision of a Deep Learning Algorithm for the Classification of Non-Contrast
Provisionally accepted- 1Izmir City Hospital, Izmir, Türkiye
- 2IE University, Segovia, Spain
- 3Technical University of Munich, Munich, Bavaria, Germany
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
This study aimed to determine the diagnostic precision of a deep learning algorithm for the classificaiton of non-contrast brain CT reports.A total of 1,861 non-contrast brain CT reports were randomly selected, anonymized, and annotated for urgency level by two radiologists, with review by a senior radiologist. The data, encrypted and stored in Excel format, were securely maintained on a university cloud system. Using Python 3.8.16, the reports were classified into four urgency categories: emergency, not emergency but needs timely attention, clinically non-significant and normal. The dataset was split, with 800 reports used for training and 200 for validation. The DistilBERT model, featuring six transformer layers and 66 million trainable parameters, was employed for text classification. Training utilized the Adam optimizer with a learning rate of 2e-5, a batch size of 32, and a dropout rate of 0.1 to prevent overfitting. The model achieved a mean F1 score of 0.85 through 5-fold cross-validation, demonstrating strong performance in categorizing radiology reports.Of the 1861 scans, 861 cases were identified as fit for study through the senior radiologist and selfhosted Label Studio interpretations. It was observed that the algorithm achieved a sensitivity of 91% and a specificity of 90% in the measurements made on the test data. The F1 score was measured as 0.89 for the best fold. The algorithm most successfully distinguished emergency results with positive predictive values that were unexpectedly lower than in previously reported studies. Beam hardening artifacts and excessive noise, compromising the quality of CT scan images, were significantly associated with decreased model performance.This study revealed decreased diagnostic accuracy of an AI decision support system (DSS) at our institution. Despite extensive evaluation, we were unable to identify the source of this discrepancy, raising concerns about the generalizability of these tools with indeterminate failure modes. These results further highlight the need for standardized study design to allow for rigorous and reproducible site-to-site comparison of emerging deep learning technologies.
Keywords: artificial intelligence, precision CTs, computed tomography, deep learning, noncontrast head CT
Received: 10 Oct 2024; Accepted: 24 Apr 2025.
Copyright: © 2025 Güzel, Aşçı, Demirbilek, Özdemir and Erekli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Hamza Eren Güzel, Izmir City Hospital, Izmir, Türkiye
Göktuğ Aşçı, IE University, Segovia, Spain
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.