- 1University of New South Wales, South-Western Sydney Clinical Campus, Kensington, NSW, Australia
- 2Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, VIC, Australia
- 3Melbourne Brain Centre, University of Melbourne, Melbourne, VIC, Australia
- 4Sydney Brain Centre, Faculty of Medicine, University of New South Wales, Kensington, NSW, Australia
- 5Melbourne Medical School, University of Melbourne, Melbourne, VIC, Australia
- 6Department of Neurology, Liverpool Hospital, Ingham Institute for Applied Medical Research Liverpool, Liverpool, NSW, Australia
Introduction: In acute stroke, identifying early changes (parenchymal hypodensity) on non-contrast CT (NCCT) can be challenging. We aimed to identify whether the accuracy of clinicians in detecting acute hypodensity in ischaemic stroke patients on a non-contrast CT is improved with the use of an Artificial Intelligence (AI) based, automated hypodensity detection algorithm (HDT) using MRI-DWI as the gold standard.
Methods: The study employed a case-crossover within-clinician design, where 32 clinicians were tasked with identifying hypodensity lesions on NCCT scans for five a priori selected patient cases, before and after viewing the AI-based HDT. The DICE similarity coefficient (DICE score) was the primary measure of accuracy. Statistical analysis compared DICE scores with and without AI-based HDT using mixed-effects linear regression, with individual NCCT scans and clinicians as nested random effects.
Results: The AI-based HDT had a mean DICE score of 0.62 for detecting hypodensity across all NCCT scans. Clinicians’ overall mean DICE score was 0.33 (SD 0.31) before AI-based HDT implementation and 0.40 (SD 0.27) after implementation. AI-based HDT use was associated with an increase of 0.07 (95% CI: 0.02–0.11, p = 0.003) in DICE score accounting for individual scan and clinician effects. For scans with small lesions, clinicians achieved a mean increase in DICE score of 0.08 (95% CI: 0.02, 0.13, p = 0.004) following AI-based HDT use. In a subgroup of 15 trainees, DICE score improved with AI-based HDT implementation [mean difference in DICE 0.09 (95% CI: 0.03, 0.14, p = 0.004)].
Discussion: AI-based automated hypodensity detection has potential to enhance clinician accuracy of detecting hypodensity in acute stroke diagnosis, especially for smaller lesions, and notably for less experienced clinicians.
Introduction
The mainstay of acute stroke imaging has been computed tomography (CT). It is relatively accessible in most hospitals throughout the world, is inexpensive compared with magnetic resonance imaging (MRI), efficient, fast and has few contraindications (1). However, in the first few hours after stroke onset, identification of the early signs of ischaemic stroke (parenchymal hypodensity and focal swelling) on non-contrast CT (NCCT) (1–3) can be challenging for even the most experienced clinicians (4). Image interpretation can delay therapeutic decisions and is often the rate limiting step, particularly if the radiologist is offsite, which often is the case in rural and remote Australia for example (5, 6). For the onsite clinicians, fatigue and inexperience can affect image interpretation and delay treatment decisions (1). However, identification of these subtle changes (particularly parenchymal hypodensity) is necessary as they likely represent irreversible ischaemia and this is an important consideration in the decision to offer reperfusion therapy (6).
As reperfusion treatment is time critical, decision support tools such as artificial intelligence (AI) based automated hypodensity detection have the potential to improve detection of early ischemic change and reduce delays in diagnosis and reperfusion treatment (1).
Current literature describes several approaches for hypodensity detection for NCCT images in stroke. For example, image filtering (windowing) to enhance the visibility of ischaemic changes (7, 8), spatial normalisation between a template of healthy controls and the examined brain (9, 10), topographic scoring using the territories of the middle cerebral artery (MCA) (11, 12), classification of the image texture features (13) and imaging biomarkers (14). A recent meta-analysis demonstrated that AI-driven tools had performed either comparable to or surpassed that of physicians in the assessment of early changes after stroke (15). This indicates that AI-based hypodensity detection tools have the potential to improve clinician performance, however this has not been assessed previously.
One AI-based automated NCCT hypodensity detection tool (HDT, MIstar, Apollo Medical Imaging, Melbourne, Australia) uses histogram-based left-right brain comparisons to detect regions-of-interest that show unilateral hypodense areas. It uses iterative level-set optimization to identify areas of hypodensity within a non-contrast CT scan. This AI-based HDT showed strong positive correlation with the gold standard, magnetic resonance imaging diffusion-weighted imaging (MRI-DWI) (correlation coefficient >0.5, unpublished data).
Thus, the aim of this study was to investigate whether the accuracy of clinicians in detecting acute hypodensity in ischaemic stroke patients on non-contrast CT is improved with the use of an AI-based automated HDT algorithm.
We hypothesized that the accuracy of clinicians in detecting acute hypodensity in ischaemic stroke patients on non-contrast CT will be improved with the use of an AI-based automated HDT algorithm.
Methods
Study design
This was a case-cross over within-clinician study where clinicians were asked to identify hypodensity lesions first before, and then after, the help of the AI-based HDT output on a set a priori purposively sampled patient cases that represent the broad population of acute stroke patients. While both individual clinicians and case scans can be considered as sources of variability in this design, the objective of investigating clinicians accuracy dictates the need to treat clinicians as units of analysis. Individual scans were purposively selected to represent variant lesion size and hemisphere, representative of acute stroke, of the anterior circulation. Among the 5 NCCT scans, 2 had large lesions with hypodensity involving more than 2/3 the MCA territory, and 3 scans had small lesions. All scans had a corresponding MRI-DWI during admission. The summary of selected case characteristics is presented in Figure 1 and Table 1.
Figure 1. Case selection top left: Image ID 1, middle left: Image ID 2, bottom left: Image ID 3, top right: Image ID 4, middle right: Image ID 5.
Clinicians were instructed to manually segment a single slice of five different patients’ NCCT scans before having access to the AI-based HDT output (Figure 2). Once they had segmented the single slice and sent their segmentation back to the study coordinator (MV), they were given the AI-based HDT output that had identified the hypodensity on the same single slice. The clinicians were asked to compare the AI-based HDT output and their initial drawing. They were asked to draw a new lesion if they felt the AI-based HDT helped identify the hypodensity. They then sent their completed segmentations back to the study coordinator for analysis.
Power analysis
As the primary objective of this study was to investigate clinicians’ accuracy, power was calculated on the need to treat clinicians as units of analysis. Recruiting 32 clinicians provided 0.8 power to detect the medium effect size (Cohen’s d = 0.5) for the difference in means of DICE score (dependent samples/matched pairs without and with the use of AI-based HDT) under the settings of two-sided type I error of 0.05. The clinicians were either neurology trainees with less than 5 years experience reading NCCT in acute stroke patients or consultant neurologists with more than 5 years experience.
Study scans
Automated NCCT AI-based hypodensity detection tool
The NCCT scans in this study were obtained from the Toshiba Aquilion One from two INternational Stroke Perfusion Imaging REgistry (INSPIRE) sites (Canon, Tokyo, Japan). The NCCT lesion was segmented automatically with the AI-based HDT algorithm on MIStar software as illustrated in Figure 1 (Apollo Medical Imaging, Melbourne, Australia). It consisted of the following steps: (1) assessing the symmetry of the density histograms of the left and right hemisphere (after registration to a template), (2) definition of potential seeds, and (3) iterative optimization of level-set thresholds.
Manual segmentation of NCCT
The clinicians were instructed to manually segment on axial views, a slice of each of the five NCCT scans. Clinicians were provided the whole brain NCCT for review only. The segmentation was completed using the paintbrush mode in the ITK-SNAP software application.1
Manual segmentation of MRI
MRI-DWI images were manually segmented by trained personnel using ITK-SNAP to extract infarct lesions as reference. MRI-DWI lesions were registered to the NCCT images using Advanced Normalization Tools (ANTS) (16).
Statistical analysis
The primary outcome of this study was assessed with the DICE similarity coefficient (DICE score). The DICE score measures the similarity of the lesion segmentations. It ranges from 0 to 1, when 0 represents no overlap and 1 represents perfect overlap. The DICE score is calculated by the following equation:
In this study, the DICE scores were calculated for the following comparisons: (1) the AI-based HDT output and the registered MRI-DWI lesion, (2) the clinicians’ segmentation before viewing the AI-based HDT and the registered MRI-DWI lesion, and (3) the clinicians’ segmentation after viewing the AI-based HDT and the registered MRI-DWI lesion.
The DICE scores were summarized using mean and standard deviation (SD). To compare the difference of the DICE score for clinicians before and after AI-based HDT implementation, a three-level mixed-effects linear regression was performed with DICE score as the outcome, AI-based HDT implementation before versus after as the independent variable, and NCCT scans and clinicians as nested random effects.
Subgroup analyses were conducted on NCCT slices with large and small lesions, as well as NCCT slices segmented by consultants versus trainees.
All statistical analyses were performed with STATA 13.0 (Stata Corp, College Station, Texas, United States). p-values less than 0.05 were considered as indicative of statistical significance. Confidence intervals (CI) were set at 95%.
Ethics
This study used data from the INSPIRE registry, a prospectively collected acute stroke clinical and imaging database. INSPIRE had central ethics approval by the Hunter New England Human Research Ethics Committee (HNEHREC 11/08/17/14.01), written informed consent was obtained for each patient for their data to be used as part of the INSPIRE registry. The INSPIRE registry and all associated analyses are conducted in accordance with the declaration of Helsinki.
Results
Data selection
A total of 32 clinicians participated in the study. Five clinicians were excluded as their drawings were incorrectly saved and unable to be read. For the remaining 27 clinicians, 26 clinicians had the 5 NCCT scan assessments completed before and after the AI-based HDT was provided, whereas 1 clinician had 2 NCCT scans reviewed and assessed. Thus, a total of 132 matched pairs of segmentations before/after AI-based HDT from 27 clinicians were included in this study.
AI-based HDT performance against MRI-DWI gold standard
When compared to the true lesion reference on MRI-DWI, the AI-based HDT resulted in a mean DICE score of 0.62 (SD 0.05) in detecting the hypodensity region on NCCT slices. The AI-based HDT performance was consistent across the 5 NCCT slices, with DICE scores ranging from 0.54 to 0.66 (Table 2).
Clinician performance against MRI-DWI gold standard
The overall mean DICE score for clinician interpretation of the NCCT slice before the AI-based HDT was applied was 0.33 (SD 0.31). The mean DICE score was 0.40 (SD 0.27) after clinicians viewed the AI-based HDT output (Table 2).
AI-based HDT use was associated with an increase of 0.07 (95% CI: 0.02–0.11, p = 0.003) in mean DICE score on mixed-effects linear regression, accounting for individual scans and clinicians as nested random effects.
For each NCCT, the performance of clinicians with and without the AI-based HDT in delineating NCCT lesion is summarized in Tables 2, 3 and Figure 3. The mixed-effects linear regression showed that DICE score was significantly increased in 4 NCCT scans after the AI-based HDT was reviewed, with the mean increase in DICE score of 0.09 (95% CI of 0.02 to 0.16) for Image ID 1, 0.12 (95% CI of 0.04 to 0.19) for Image ID 2, 0.08 (95% CI of 0.01 to 0.16) for Image ID 3, 0.07 (95% CI of 0.02 to 0.11) for Image ID 5. No such evidence of an increase was observed for Image ID 4, with the magnitude of −0.03 (95% CI of −0.07 to 0.01).
Performance of clinicians when assessing large versus small lesions
For the slices with large lesions (53 paired segmentations), the mean DICE score was 0.44 (SD 0.31) and 0.48 (SD 0.25) for clinicians before and after the AI-based HDT implementation, respectively. This improvement after AI-based HDT for the large lesion cases was not significant, with the mean increase in DICE score of 0.05 (95% CI: −0.02, 0.11, p = 0.192). For the slices with small lesions (79 paired segmentations), the mean DICE score was 0.25 (SD 0.29) and 0.35 (SD 0.28) for clinicians before, and after, the AI-based HDT implementation. The mean increase in DICE scores following AI-based HDT use was significant, 0.08 (95% CI: 0.02, 0.13, p = 0.004).
Performance of consultant versus trainee
Among the 27 clinicians, 12 were consultant neurologists and 15 were trainee neurologists. For the subgroup of 12 consultants the mean DICE score was 0.40 (SD 0.33) and 0.45 (SD of 0.29) before and after the AI-based HDT implementation, respectively. This was not a significant improvement in DICE score for the consultants after AI-based HDT (mean difference in DICE = 0.04, 95% CI: −0.02, 0.107, p = 0.189). In contrast, the subgroup of 15 trainees produced a mean DICE score of 0.26 (SD of 0.28) before and 0.37 (SD of 0.26) after AI-based HDT, respectively, (Figure 3), demonstrating a significant AI-based HDT effect in improving DICE scores. The mean difference in DICE was 0.09 (95% CI: 0.03, 0.14, p = 0.004). The performance of consultants versus trainees for each scan is further illustrated in Figure 4.
Figure 4. Forrest plot of DICE score changes before and after HDT implementation by consultants (A) and trainees (B).
Success rate of HDT implementation
On review of the data, the clinicians decided not to change their original segmentations after reviewing the AI-based HDT output exactly half (50, 95% CI: 42, 58%) of the time. When we analysed only the segmentations that were changed after reviewing the AI-based HDT output, we noted the following changes to the original segmentation. For the consultants, mean DICE score was 0.42 (SD 0.31) and 0.51 (SD 0.19) respectively, and mean difference of DICE was 0.09 (95%CI: −0.01, 0.19) for the consultants. For the trainees, mean DICE was 0.31 (SD 0.29) and 0.51 (SD 0.16) respectively, and difference of mean DICE was 0.20 (0.12, 0.28).
Discussion
We proved our primary hypothesis that the artificial intelligence (AI) based hypodensity detection tool (HDT) improved clinician detection of acute hypodensity on NCCT. Indeed, the AI-based HDT had relatively good performance in the detection of acute hypodensity on NCCT, and, it proved to be superior to all clinicians. Some might interpret our results as “AI should replace clinicians.” This is the first time the automated hypodensity detection algorithm has been used in clinical practise. The study was not designed to show superiority of the algorithm, it was designed to determine if it helped doctors’ identity hypodensity. A further study could be completed to show definite superiority of the algorithm.
Notably the AI-based HDT was most useful for neurology trainees, and when assessing small hypodense lesions. This is evident when looking at individual cases, given Image ID 3 had the best DICE score before AI-based HDT was applied. If all neurology trainees had changed their segmentation after reviewing the AI-based HDT, then the results may have been even more impressive.
These results suggest that clinicians applying AI-based HDT to acute stroke CT would have the greatest impact in regional, rural, and remote health care settings, where expert neurology consultants are not immediately available to assess acute stroke imaging for patients who need reperfusion therapy. Whilst any AI-based automated imaging tool does not replace the need for clinical judgment, AI-based HDT could provide support and guidance, and/or notify onsite doctors without expertise in NCCT assessment that the patient being assessed may have hypodensity on their NCCT, which can then help to influence treatment decisions. A further, larger validation study would also include radiology trainees, radiologist and neuroradiologists.
One of the most interesting findings of this study was the “resistance” of the doctors to change their assessment of the NCCT, even after reviewing the AI-based HDT output. Despite the clinicians knowing they were receiving an AI-based automated hypodensity detection output (which may be considered a source of bias), 50% of the participants did not change their original drawing. This assumes, 50% of the doctors were not willing to trust the AI-based hypodensity detection tool. This either speaks to the doctor’s (over)confidence in their ability to assess NCCT, or their wariness of relying on AI-based HDT to make an imaging decision (which has flow on effects on treatment decisions).
The main limitation of this study was the number of clinicians recruited and the resultant total of paired segmentations. The variability of results comes from the clinicians, with the scans used to test the clinician’s ability. The scans were selected to reflect different characteristics of lesions (size, location) to ensure the representative sample of scans, making them generalisable in acute stroke and a strength of this study.
In summary, these findings show the potential of AI-based HDT to significantly enhance clinician diagnostic precision and holds promise for its valuable application in clinical practice, particularly in neurology training and challenging diagnostic scenarios involving smaller hypodense lesions.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by Hunter New England Human Research Ethics Committee (HNEHREC 11/08/17/14.01). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
AS: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Validation, Writing – original draft, Writing – review & editing. MV: Conceptualization, Data curation, Formal analysis, Methodology, Writing – review & editing. LL: Conceptualization, Data curation, Formal analysis, Methodology, Writing – review & editing. AB: Methodology, Supervision, Writing – review & editing. LC: Methodology, Supervision, Writing – review & editing. MP: Methodology, Supervision, Writing – review & editing.
Funding
The author(s) declare financial support was received for the research, authorship, and/or publication of this article. AS is supported by the Australian Commonwealth Government on a Research Training Program Scholarship.
Acknowledgments
The participants of this study are Balabanski A., Sharobeam A., Park A., Lim B., Colman B., Giarola B., Williams C., Esperon C., Murtimer C., Blair C., Preml D., Hg F., Zhao H., Beharry J., Broadley J., Thomas J., Shipley J., Ng J., Wong J., Ja J., Baskin J., Ma M., Linger M., Megans M., Valente M., Nyugen M., Lizark N., Swarup O., Siriratnam P., Holper S., Yogendrakumar V., and Stanislau V.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Footnotes
References
1. Bivard, A , Churilov, L , and Parsons, M . Artificial intelligence for decision support in acute stroke—current roles and potential. Nat Rev Neurol. (2020) 16:575–85. doi: 10.1038/s41582-020-0390-y
2. Peter, R , Korfiatis, P , Blezek, D , Oscar Beitia, A , Stepan-Buksakowska, I , Horinek, D, et al. A quantitative symmetry-based analysis of hyperacute ischemic stroke lesions in noncontrast computed tomography. Med Phys. (2017) 44:192–9. doi: 10.1002/mp.12015
3. Gao, J , Parsons, MW , Kawano, H , Levi, CR , Evans, TJ , Lin, L, et al. Visibility of CT early ischemic change is significantly associated with time from stroke onset to baseline scan beyond the first 3 hours of stroke onset. J Stroke. (2017) 19:340–6. doi: 10.5853/jos.2016.01424
4. Saver, JL . Time is brain—quantified. Stroke. (2006) 37:263–6. doi: 10.1161/01.STR.0000196957.55928.ab
5. RANZCR . Clinical radiology workforce census report. Australia: RANZCR Clinical Radiology Workforce Census Report. (2016).
6. Qiu, W , Kuang, H , Teleg, E , Ospel, JM , Sohn, SI , Almekhlafi, M, et al. Machine learning for detecting early infarction in acute stroke with non-contrast-enhanced CT. Radiology. (2020) 294:638–44. doi: 10.1148/radiol.2020191193
7. Przelaskowski, A , Sklinda, K , Bargieł, P , Walecki, J , Biesiadko-Matuszewska, M , and Kazubek, M . Improved early stroke detection: wavelet-based perception enhancement of computerized tomography exams. Comput Biol Med. (2007) 37:524–33. doi: 10.1016/j.compbiomed.2006.08.004
8. Takahashi, N , Lee, Y , Tsai, DY , Ishii, K , Kinoshita, T , Tamura, H, et al. Improvement of detection of hypoattenuation in acute ischemic stroke in unenhanced computed tomography using an adaptive smoothing filter. Acta Radiol. (2008) 49:816–26. doi: 10.1080/02841850802126570
9. Takahashi, N , Tsai, DY , Lee, Y , Kinoshita, T , and Ishii, K . Z-score mapping method for extracting hypoattenuation areas of hyperacute stroke in unenhanced CT. Acad Radiol. (2010) 17:84–92. doi: 10.1016/j.acra.2009.07.011
10. Gillebert, CR , Humphreys, GW , and Mantini, D . Automated delineation of stroke lesions using brain CT images. Neuroimage Clin. (2014) 4:540–8. doi: 10.1016/j.nicl.2014.03.009
11. Takahashi, N , Tsai, DY , Tsai, DY , Lee, Y , Kinoshita, T , Ishii, K, et al. Usefulness of z-score mapping for quantification of extent of hypoattenuation regions of hyperacute stroke in unenhanced computed tomography: analysis of radiologists’ performance. J Comput Assist Tomogr. (2010) 34:751–6. doi: 10.1097/RCT.0b013e3181e66473
12. Takahashi, N , Lee, Y , Tsai, D-Y , Kinoshita, T , Ouchi, N , and Ishii, K . Computer-aided detection scheme for identification of hypoattenuation of acute stroke in unenhanced CT. Radiol Phys Technol. (2012) 5:98–104. doi: 10.1007/s12194-011-0143-0
13. Tang, F-H , Ng, DK , and Chow, DH . An image feature approach for computer-aided detection of ischemic stroke. Comput Biol Med. (2011) 41:529–36. doi: 10.1016/j.compbiomed.2011.05.001
14. Nowinski, WL , Gupta, V , Qian, G , He, J , Poh, LE , Ambrosius, W, et al. Automatic detection, localization, and volume estimation of ischemic infarcts in noncontrast computed tomographic scans: method and preliminary results. Invest Radiol. (2013) 48:661–70. doi: 10.1097/RLI.0b013e31828d8403
15. Adamou, A , Beltsios, ET , Bania, A , Gkana, A , Kastrup, A , Chatziioannou, A, et al. Artificial intelligence-driven ASPECTS for the detection of early stroke changes in non-contrast CT: a systematic review and meta-analysis. J Neurointerv Surg. (2023) 15:e298–304. doi: 10.1136/jnis-2022-019447
Keywords: automated hypodensity detection tools, artificial intelligence, machine learning, acute ischaemic stroke, treatment
Citation: Santos AD, Visser M, Lin L, Bivard A, Churilov L and Parsons MW (2024) Novel artificial intelligence-based hypodensity detection tool improves clinician identification of hypodensity on non-contrast computed tomography in stroke patients. Front. Neurol. 15:1359775. doi: 10.3389/fneur.2024.1359775
Edited by:
Zijun Zhang, City University of Hong Kong, Hong Kong SAR, ChinaReviewed by:
Zhenyu Shu, Zhejiang Provincial People’s Hospital, ChinaThomas Lindner, University of Hamburg, Germany
Liuyin Chen, City University of Hong Kong, Hong Kong SAR, China
Copyright © 2024 Santos, Visser, Lin, Bivard, Churilov and Parsons. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Angela Dos Santos, a.dossantos@unsw.edu.au