Deep Learning-Based Method to Differentiate Neuromyelitis Optica Spectrum Disorder From Multiple Sclerosis

Kim, Hyunjin; Lee, Youngin; Kim, Yong-Hwan; Lim, Young-Min; Lee, Ji Sung; Woo, Jincheol; Jang, Su-Kyeong; Oh, Yeo Jin; Kim, Hye Weon; Lee, Eun-Jae; Kang, Dong-Wha; Kim, Kwang-Kuk

doi:10.3389/fneur.2020.599042

ORIGINAL RESEARCH article

Front. Neurol. , 30 November 2020

Sec. Multiple Sclerosis and Neuroimmunology

Volume 11 - 2020 | https://doi.org/10.3389/fneur.2020.599042

Deep Learning-Based Method to Differentiate Neuromyelitis Optica Spectrum Disorder From Multiple Sclerosis

$\nHyunjin Kim&#x;$ Hyunjin Kim¹^†

Youngin Lee^2,3^†

Yong-Hwan Kim²

Young-Min Lim¹

Ji Sung Lee^4,5

Jincheol Woo²

Su-Kyeong Jang²

Yeo Jin Oh¹

Hye Weon Kim¹

Eun-Jae Lee¹

Dong-Wha Kang^1,2^*

Kwang-Kuk Kim¹^*

¹Department of Neurology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
²Asan Institute for Life Sciences, Asan Medical Center, Seoul, South Korea
³Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
⁴Clinical Research Center, Asan Institute for Life Sciences, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
⁵Department of Clinical Epidemiology and Biostatistics, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea

Background: Differentiating neuromyelitis optica spectrum disorder (NMOSD) from multiple sclerosis (MS) is crucial in the field of diagnostics because, despite their similarities, the treatments for these two diseases are substantially different, and disease-modifying treatments for MS can worsen NMOSD. As brain magnetic resonance imaging (MRI) is an important tool to distinguish the two diseases, extensive research has been conducted to identify the defining characteristics of MRI images corresponding to these two diseases. However, the application of such research in clinical practice is still limited. In this study, we investigate the applicability of a deep learning-based algorithm for differentiating NMOSD from MS.

Methods: In this study, we included 338 participants (213 patients with MS, 125 patients with NMOSD) who visited the Asan medical center between February 2009 and February 2020. A 3D convolutional neural network, which is a deep learning-based algorithm, was trained using fluid-attenuated inversion recovery images and clinical information of the participants. The performance of the final model in differentiating NMOSD from MS was evaluated and compared with that of two neurologists.

Results: The deep learning-based model exhibited an area under the receiver operating characteristic curve of 0.82 (95% CI, 0.75–0.89). It differentiated NMOSD from MS with an accuracy of 71.1% (sensitivity = 87.8%, specificity = 61.6%), which is comparable to that exhibited by the neurologists. The intra-rater reliability of the two neurologists was moderate (κ = 0.47, 0.50), which was in contrast with the consistent classification of the deep learning-based model.

Conclusion: The proposed model was verified to be capable of differentiating NMOSD from MS with accuracy comparable to that of neurologists, exhibiting the advantage of consistent classification. As a result, it can aid differential diagnosis between two important central nervous system inflammatory diseases in clinical practice.

Introduction

Neuromyelitis optica spectrum disorder (NMOSD) and multiple sclerosis (MS) are both inflammatory diseases of the central nervous system (CNS) (1). Because of the clinical and radiological similarities between the two, there has been a persistent debate on whether they are actually different. However, since the discovery of the anti-aquaporin-4 antibody (AQP4-Ab), which is an NMOSD-specific autoantibody (2), studies have confirmed MS and NMOSD to be distinct disease entities (3). Differentiating NMOSD from MS is of considerable importance in the field of diagnostics because the treatments for the two diseases differ considerably from each other, and disease-modifying therapies for MS can worsen NMOSD (4–6). Even though the presence of AQP4-Ab is essential for the diagnosis of AQP4-Ab-seropositive NMOSD, clinical and radiological differentiation between AQP4-Ab-seropositive NMOSD and MS remains crucial for the following reasons: (i) clinicians need to identify patients on whom the AQP4-Ab test should be performed; (ii) the result of an AQP4-Ab assay can be influenced by assay methodology and the patient's clinical status (3, 7, 8). Therefore, the role of brain MRI, which is the most common test for CNS inflammatory diseases, is significant in the differentiation between the two diseases (9).

Recently, machine learning-based algorithms have been applied to classify MRIs of patients with various neurological diseases (10–12). In particular, researchers have attempted to use various such methods, including multimodal data fusion and random forests, to differentiate NMOSD from MS (13, 14). Deep learning, which is a type of machine learning, does not require the specification of explicit features by experts and deduces most predictive features directly based on images (15, 16). Deep learning has been applied for classification based on medical images such as chest X-rays and fundus photographs, and it has been reported to exhibit excellent differentiation performance (17, 18). Fluid-attenuated inversion recovery (FLAIR) is an MRI technique that highlights T2 hyperintense lesions while suppressing cerebrospinal fluid (CSF) signals, thereby clearly revealing lesions that are in proximity to CSF, such as juxtacortical and periventricular lesions (19). FLAIR is considered to be superior to T2-weighted images in the context of the detection of MS brain lesions (20).

In this study, we aim to develop a deep learning model based on brain FLAIR MRIs and elementary clinical information to differentiate NMOSD from MS. Further, we evaluated the clinical applicability of the proposed model by comparing its performance with that of two neurologists.

Materials and Methods

Participants

We retrospectively reviewed the medical records of patients with MS and NMOSD who visited the Asan Medical Center, Seoul, Korea, between February 2009 and February 2020. Patients with MS who fulfilled the 2010 McDonald criteria and those with NMOSD with AQP4 immunoglobulin G (AQP4-IgG) seropositivity, in accordance with the 2015 International Consensus of NMOSD, were included in this study. The AQP4-IgG-seropositive status was confirmed via a commercial fixed cell-based assay (Euroimmun, Lubeck, Germany). Patients without available brain MRI data were excluded from the study. As a representative MRI of each patient, the last MRI was examined to use the latest data. All images were deidentified prior to being transferred to the study investigators.

MRI Acquisition

2D FLAIR sequences were acquired using 1.5 T or 3.0 T scanners at the Asan Medical Center or other centers that referred patients and transferred images to the Asan Medical Center. The MRI protocol at the Asan Medical center is a fast spin-echo sequence with inversion recovery with the following scanning parameters: field of view = 220 × 200 mm; voxel size = 0.65 mm × 0.82 mm; a 336 × 231 acquisition matrix; time of repetition = 11,000 ms; time of echo = 125 ms; inversion time = 2,500 ms; slice thickness = 5 mm; slice gap = 2 mm; and number of acquisitions = 2.

NMOSD and MS Classification by Neurologists

Two board-certified neurologists (H.W. Kim and Y.J. Oh) who have completed a 1-year clinical fellowship in neuro-immunology participated in this study as the human raters. They binarily diagnosed MS and NMOSD independently of each other by reviewing the FLAIR images alongside the following clinical information of each patient: age at disease onset, age at the time of MRI, sex, disease duration, and duration from last relapse. We considered these five variables to be essential to adequately capture the elementary information required for differential diagnoses in clinical practice.

NMOSD and MS Classification via Deep Learning-Based Model

All methods were implemented using Python and PyTorch for deep learning. The algorithm was executed on the Intel Core i7-8700K 3.70 GHz processor with two Nvidia GeForce RTX 2080 Ti graphics processing units. Figure 1 depicts the preprocessing steps whose details have been described below.

FIGURE 1

Figure 1. Steps of data preprocessing. The diagram depicts the preprocessing of data before it was input into the deep learning model. Firstly, the whole dataset was split into training, validation, and test sets. Subsequently, clinical information and fluid-attenuated inversion recovery (FLAIR) images were preprocessed separately. The details regarding the preprocessing procedure have been described in the Methods section.

Subdivision of the Dataset Into Training, Validation, and Test Sets

We first split the dataset into training (45%, n = 152), validation (15%, n = 51), and test (40%, n = 135) datasets. We used stratified random sampling to ensure identical class ratios for every set.

Scaling Clinical Information

Four pieces of clinical information—age at disease onset, age at the time of MRI, disease duration, and duration from last relapse—were scaled to exhibit a median of 0 and an interquartile range of 1. As this scaling method uses statistics that are robust to outliers, the features can be transformed to exhibit nearly identical scales while being minimally affected by outliers.

Image Pre-processing

Brain mask data for each FLAIR sequence were obtained using a previously reported brain extraction algorithm (21). The brain mask data are presented as binary data, wherein only the voxels corresponding to brain parts are coded to correspond to 1. The FLAIR images were bias-corrected via the lesion prediction algorithm (22), as implemented in lesion segmentation toolbox (LST) version 2.0.15 (www.statistical-modeling.de/lst.html), for statistical parametric mapping (SPM, www.fil.ion.ucl.ac.uk/spm/). Following bias-correction, we obtained a file containing deformation parameters for each image; the parameters were capable of registering the image from its native space to the Montreal Neurological Institute (MNI) space with a resolution of 2 mm × 2 mm × 2 mm (23). Using these deformation parameters, we registered the MRIs belonging to the training set to the MNI space. Furthermore, using functional MRI of the brain software library (FSL) functions (24), these registered images were made to undergo 3D intensity normalization and were then averaged into one image, which was used as a template in this study. After creating the template, all the FLAIR images and brain masks were registered to the template and intensity normalization with Z-scoring was conducted.

Model Architecture

Deep learning, particularly convolutional neural networks (CNNs), has been widely used in computer vision (25). Among the different variants of CNNs, ResNet has exhibited remarkable performance in image classification (26). Additionally, although most imaging studies have used 2D CNNs as their model architecture, some recent studies have proposed the use of 3D CNNs to fully utilize the spatial features of MRI and achieve better performance (27–30). Therefore, we constructed a 3D CNN architecture based on the idea of ResNeXt (31), which is a more developed model based on ResNet (Figure 2A). To use both clinical information and FLAIR data simultaneously, we concatenated the clinical information at the end of CNN architecture and propagated the information through two fully connected layers (Figure 2B).

FIGURE 2

Figure 2. Schema of the model architecture. (A) The 3D convolutional neural network architecture used in our study. (B) The entire structure of the proposed model. The preprocessed fluid-attenuated inversion recovery (FLAIR) image was transmitted through the 3D convolutional neural network (CNN) and its feature vector was extracted and concatenated with the preprocessed clinical information. AvgPool, average-pooling; BN, batch normalization; Conv, convolution; FC, fully connected; MaxPool, max-pooling; ReLU, rectified linear unit.

Modeling

We trained and validated several models that differ from each other in terms of learning rate scheduling strategies, dropout rates, loss functions, and data augmentation strategies. After validating them using the validation set, we selected the best model. Stochastic gradient descent with a momentum of 0.9 and a weight decay of 10⁻⁶ was used as the optimizer. The initial learning rate of 0.025 was decayed using cosine annealing, and the pre-defined minimal learning rate, 0.015, was reached after 20 epochs. This cycle was repeated throughout the whole training process. This training strategy is called stochastic gradient descent with warm restarts (32). A batch size of 16 was selected and focal loss was used as a loss function for the best model, in which gamma and alpha were 1 and 0.25, respectively. Focal loss is often used when the class ratio of data is imbalanced (33). We expected the model to focus more on the NMOSD class by training it using focal loss. To prevent overfitting, rotation or translation was randomly applied to augment the FLAIR data immediately before they were input into the model. Further, the dropout strategy was used with a rate of 0.5. Both augmentation and dropout were conducted only during the training of the model and neither were applied during the validation or evaluation process. The final performance of the model was evaluated using the test set that had been split beforehand.

Statistical Analysis

In this study, descriptive summaries were represented as frequencies and percentages for categorical variables and as mean ± standard deviation for continuous variables. In order to compare pairs of groups, Student's t-tests were used for continuous variables and Chi-squared tests were used for categorical variables. The diagnostic performances of the proposed model and the human raters (i.e., raters A and B) were measured in terms of sensitivity (i.e., correct ratio for the NMOSD group), specificity (correct ratio for the MS group), and accuracy (i.e., correct ratio regardless of the group) of their decisions related to the test set. Regarding sensitivity, specificity, and accuracy, a two-sided 95% exact confidence interval (CI) was calculated using the Clopper–Pearson method (34). McNemar's test was used to compare the decision performances of the deep learning-based model and the human raters. The diagnostic performance of the deep learning-based model was assessed using area under the receiver operating characteristic curve (AUC). The 95% CI of AUC was calculated following DeLong's method (35). Cohen's kappa (κ) test was adopted to evaluate the intra-rater reliability of the human raters between the first and second decisions on randomly duplicated 10% of identical samples in the test set; moreover, the aforementioned test was also performed to test the inter-rater agreement (on the test set) between the decisions of the two human raters and between the decisions of the human raters and those of the deep learning-based model. The strength of agreement based on κ was judged according to the following guidelines: <0.2 = slight; 0.2–0.4 = fair; 0.4–0.6 = moderate; 0.6–0.8 = substantial; and >0.8 = almost perfect (36). Two-sided P-values of < 0.05 were indicative of statistical significance. All statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC).

Results

Baseline Characteristics of the Participants

We included 338 participants (213 patients with MS and 125 patients with NMOSD) in the study. The baseline characteristics of all participants have been presented in Table 1. The mean age at onset was lower for the MS group than that for the NMOSD group (33.1 ± 12.3 vs. 41.7 ± 13.7 years, P < 0.001). Fifty-five (25.8%) patients in the RRMS group and 13 (10.4%) in the NMOSD group were male. The mean disease duration and duration from last relapse were observed to be longer for the MS group. The percentages of MRIs performed at the Asan Medical Center (vs. other centers) and those of MRIs performed with a 3 T scanner (vs. 1.5 T scanner) were comparable between the two groups. Further, 30% (37/125) of brain MRIs from patients with NMOSD did not exhibit any T2 hyperintense lesions (≥3 mm). In contrast, all brain MRIs from patients with MS exhibited more than one T2 hyperintense lesion.

TABLE 1

Table 1. Baseline characteristics.

Diagnostic Performance

Figure 3 depicts the receiver operating characteristic curve of the model when it is evaluated using the test set (AUC, 0.82 [95% CI, 0.75–0.89]). The classification results of the human raters and deep learning-based model have been presented in Table 2. The accuracy of the deep learning model in differentiating NMOSD from MS was 71.1%, with a sensitivity of 87.8% and a specificity of 61.6%, all of which were comparable to those of the human raters. The intra-rater reliability of each human rater was moderate (κ of rater A = 0.471, κ of rater B = 0.500); the inter-rater agreement between the human raters was moderate (κ = 0.437); and the inter-rater agreements between the deep learning-based model and each human rater were fair and slight, respectively (κ = 0.288 between the deep learning-based model and rater A; κ = 0.154 between the deep learning-based model and rater B).

FIGURE 3

Figure 3. Diagnostic performance of the deep learning-based model and the neurologists. Performance on an independent test set with n = 135. The area under the receiver operating characteristic (AUC) of the proposed model (represented by the violet line) was 0.82 (95% CI, 0.75–0.89). The points correspond to the performance of the neurologists.

TABLE 2

Table 2. Classification results.

Discussion

In this study, we demonstrated that the proposed deep learning-based model is capable of differentiating NMOSD from MS with an accuracy of 71.1% (sensitivity = 87.8%, specificity = 61.6%), which was comparable to that achieved by the neurologists. Although the diagnostic accuracies were comparable, the deep learning-based model offers some advantages over human evaluation, including greater consistency in classification (intra-rater reliability κ = 1.0) and prompt reporting of results. The intra-rater reliability of the human raters was only moderate (κ = 0.47–0.50).

The definition of a brain lesion distribution as “at least one lesion adjacent to the body of the lateral ventricle and in the inferior temporal lobe or the presence of a subcortical U-fiber lesion or a Dawson's finger-type lesion” has been suggested as a criterion; this can be employed to distinguish patients with MS from those with NMOSD with 92% sensitivity and 96% specificity (37). In addition, the same criterion was used to distinguish MS from myelin oligodendrocyte glycoprotein antibody-associated diseases with a sensitivity of 90.9% and a specificity of 95.2% (38). However, the aforementioned studies did not include brain MRIs without T2 hyperintense lesions in their investigations, and the criterion included subjective definitions such as Dawson's fingers, which could be difficult to apply for clinicians who have little experience with CNS inflammatory diseases. In the current study, we included all MRIs with or without brain lesions and did not designate any specific lesion criteria. Instead, the deep learning-based model learned the most predictive features directly from the images. Therefore, we suggest that the proposed deep learning-based model is capable of overcoming the low practicability of the previously published brain lesion distribution criteria.

Two machine learning-based and one deep learning-based methods using brain MRI data had previously been developed to distinguish NMOSD from MS (13, 14, 39). The studies had adopted various methods, including multimodal data fusion, random forest classification, and hierarchical multimodal fusion, and achieved accuracies ranging from 74 to 88%. However, the applicability of the developed methods in clinical practice was uncertain owing to the lack of their comparison to assessments by clinicians. In our study, we demonstrated that the proposed model exhibits a performance comparable to that of trained clinicians, demonstrating its potential clinical applicability. Recent systematic reviews of studies on the comparison between performances of artificial intelligence and clinicians have reported that the performance of artificial intelligence was comparable to that of clinicians (40, 41).

Several limitations should be noted for the current study. Firstly, we conducted this study in a retrospective manner without external validation, which entails the risk of bias and lack of generalizability. Further, the study participants were of a single ethnicity (Korean), which implies that our result might not be applicable to patients from other ethnic backgrounds (42). Future prospective studies that incorporate data from other international centers and larger data samples can overcome this limitation. Secondly, spinal MRI and CSF findings, other important diagnostic clues (7, 43), may strengthen the performance of the deep learning-based model. In the present study, we have tried to develop a model based on minimal information, but we may add these variables when developing a model in the future. Lastly, the proposed deep learning-based model was trained for binary classification. Hence, this model is not an automatic brain MRI interpreter. Our results provide evidence that deep learning can support the objective differential diagnosis of MS and NMOSD. In the future, we may attempt to include other CNS diseases using this deep learning-based model to solve multiclass classification problems.

In conclusion, the proposed deep learning-based model was verified to be capable of differentiating NMOSD from MS with an accuracy comparable to that of neurologists, exhibiting distinct advantages in terms of the consistency of classification. The proposed model has the potential to aid differential diagnosis of two important CNS inflammatory diseases in clinical practice. Further research is necessary to determine the applicability of this model in clinical settings and to determine whether the utilization of the model can lead to improved patient care and prognoses.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics Statement

The studies involving human participants were reviewed and approved by Institutional Review Board of the Asan Medical Center. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author Contributions

D-WK and K-KK contributed to conception and design of the study. HK and YL wrote the manuscript. Y-ML, E-JL, and K-KK collected clinical and MRI data. YO and HWK classified the MRI data. HK, YL, and JL conducted data analysis. D-WK and E-JL acquired funding. Y-ML, E-JL, Y-HK, JW, S-KJ, D-WK, and K-KK supervised the study and revised the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Funding

This study was supported by grants from the Ministry of Science and ICT, South Korea (NRF-2018R1C1B6008884, NRF-2018M3A9E8066249), and Ministry of Health and Welfare, South Korea (HI18C2383).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank Editage (www.editage.co.kr) for English language editing.

References

1. Hoftberger R, Lassmann H. Inflammatory demyelinating diseases of the central nervous system. Handbook Clin Neurol. (2017) 145:263–83. doi: 10.1016/B978-0-12-802395-2.00019-5

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Lennon VA, Wingerchuk DM, Kryzer TJ, Pittock SJ, Lucchinetti CF, Fujihara K, et al. A serum autoantibody marker of neuromyelitis optica: distinction from multiple sclerosis. Lancet. (2004) 364:2106–12. doi: 10.1016/S0140-6736(04)17551-X

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Kim S-M, Kim S-J, Lee HJ, Kuroda H, Palace J, Fujihara K. Differential diagnosis of neuromyelitis optica spectrum disorders. Ther Adv Neurol Disord. (2017) 10:265–89. doi: 10.1177/1756285617709723

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Min JH, Kim BJ, Lee KH. Development of extensive brain lesions following fingolimod (FTY720) treatment in a patient with neuromyelitis optica spectrum disorder. Mult Scler. (2012) 18:113–5. doi: 10.1177/1352458511431973

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Kleiter I, Hellwig K, Berthele A, Kumpfel T, Linker RA, Hartung HP, et al. Failure of natalizumab to prevent relapses in neuromyelitis optica. Arch Neurol. (2012) 69:239–45. doi: 10.1001/archneurol.2011.216

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Palace J, Leite MI, Nairne A, Vincent A. Interferon Beta treatment in neuromyelitis optica: increase in relapses and aquaporin 4 antibody titers. Arch Neurol. (2010) 67:1016–7. doi: 10.1001/archneurol.2010.188

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Kim HJ, Paul F, Lana-Peixoto MA, Tenembaum S, Asgari N, Palace J, et al. MRI characteristics of neuromyelitis optica spectrum disorder: an international update. Neurology. (2015) 84:1165–73. doi: 10.1212/WNL.0000000000001367

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Waters PJ, Pittock SJ, Bennett JL, Jarius S, Weinshenker BG, Wingerchuk DM. Evaluation of aquaporin-4 antibody assays. Clin Exp Neuroimmunol. (2014) 5:290–303. doi: 10.1111/cen3.12107

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Geraldes R, Ciccarelli O, Barkhof F, De Stefano N, Enzinger C, Filippi M, et al. The current role of MRI in differentiating multiple sclerosis from its imaging mimics. Nat Rev Neurol. (2018) 14:199–213. doi: 10.1038/nrneurol.2018.14

CrossRef Full Text | Google Scholar

10. Alcaide-Leon P, Dufort P, Geraldo AF, Alshafai L, Maralani PJ, Spears J, et al. Differentiation of enhancing glioma and primary central nervous system lymphoma by texture-based machine learning. Am J Neuroradiol. (2017) 38:1145–50. doi: 10.3174/ajnr.A5173

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Salvatore C, Cerasa A, Castiglioni I, Gallivanone F, Augimeri A, Lopez M, et al. Machine learning on brain MRI data for differential diagnosis of Parkinson's disease and Progressive Supranuclear Palsy. J Neurosci Methods. (2014) 222:230–7. doi: 10.1016/j.jneumeth.2013.11.016

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Lee H, Lee EJ, Ham S, Lee HB, Lee JS, Kwon SU, et al. Machine learning approach to identify stroke within 4.5 hours. Stroke. (2020) 51:860–6. doi: 10.1161/STROKEAHA.119.027611

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Eshaghi A, Riyahi-Alam S, Saeedi R, Roostaei T, Nazeri A, Aghsaei A, et al. Classification algorithms with multi-modal data fusion could accurately distinguish neuromyelitis optica from multiple sclerosis. NeuroImage Clin. (2015) 7:306–14. doi: 10.1016/j.nicl.2015.01.001

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Eshaghi A, Wottschel V, Cortese R, Calabrese M, Sahraian MA, Thompson AJ, et al. Gray matter MRI differentiates neuromyelitis optica from multiple sclerosis using random forest. Neurology. (2016) 87:2463–70. doi: 10.1212/WNL.0000000000003395

PubMed Abstract | CrossRef Full Text | Google Scholar

15. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. (2015) 521:436. doi: 10.1038/nature14539

CrossRef Full Text | Google Scholar

16. Lee EJ, Kim YH, Kim N, Kang DW. Deep into the brain: artificial intelligence in stroke imaging. J Stroke. (2017) 19:277–85. doi: 10.5853/jos.2017.02054

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. (2017) 284:574–82. doi: 10.1148/radiol.2017162326

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama. (2016) 316:2402–10. doi: 10.1001/jama.2016.17216

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Adams JG, Melhem ER. Clinical usefulness of T2-weighted fluid-attenuated inversion recovery MR imaging of the CNS. Am J Roentgenol. (1999) 172:529–36. doi: 10.2214/ajr.172.2.9930818

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Gawne-Cain ML, O'Riordan JI, Thompson AJ, Moseley IF, Miller DH. Multiple sclerosis lesion detection in the brain: a comparison of fast fluid-attenuated inversion recovery and conventional T2-weighted dual spin echo. Neurology. (1997) 49:364–70. doi: 10.1212/WNL.49.2.364

CrossRef Full Text | Google Scholar

21. Isensee F, Schell M, Pflueger I, Brugnara G, Bonekamp D, Neuberger U, et al. Automated brain extraction of multisequence MRI using artificial neural networks. Human Brain Mapp. (2019) 40:4952–64. doi: 10.1002/hbm.24750

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Schmidt P. Bayesian Inference for Structured Additive Regression Models for Large-Scale Problems With Applications to Medical Imaging. Munich: Ludwig-Maximilians-Universität München (2017).

Google Scholar

23. Evans AC, Collins DL, Mills SR, Brown ED, Kelly RL, Peters TM editors. 3D statistical neuroanatomical models from 305 MRI volumes. In: 1993 IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference San Francisco, CA (1993).

Google Scholar

24. Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM. FSL. NeuroImage. (2012) 62:782–90. doi: 10.1016/j.neuroimage.2011.09.015

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Eitel F, Soehler E, Bellmann-Strobl J, Brandt AU, Ruprecht K, Giess RM, et al. Uncovering convolutional neural network decisions for diagnosing multiple sclerosis on conventional MRI using layer-wise relevance propagation. NeuroImage Clin. (2019) 24:102003. doi: 10.1016/j.nicl.2019.102003

PubMed Abstract | CrossRef Full Text | Google Scholar

26. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV (2016). p. 770–7.

Google Scholar

27. Cole JH, Poudel RPK, Tsagkrasoulis D, Caan MWA, Steves C, Spector TD, et al. Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. NeuroImage. (2017) 163:115–24. doi: 10.1016/j.neuroimage.2017.07.059

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Korolev S, Safiullin A, Belyaev M, Dodonova Y. Residual and plain convolutional neural networks for 3D brain MRI classification. In: IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). Melbourne, VIC (2017). p. 835–8. doi: 10.1109/ISBI.2017.7950647

CrossRef Full Text | Google Scholar

29. Zou L, Zheng J, Miao C, Mckeown M, Wang Z. 3D CNN based automatic diagnosis of attention deficit hyperactivity disorder using functional and structural MRI. IEEE Access 5:23626–36. doi: 10.1109/ACCESS.2017.2762703

CrossRef Full Text | Google Scholar

30. Yuan L, Wei X, Shen H, Zeng L, Hu D. Multi-center brain imaging classification using a novel 3D CNN approach. IEEE Access 6:49925–34. doi: 10.1109/ACCESS.2018.2868813

CrossRef Full Text | Google Scholar

31. Xie S, Girshick RB, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI (2017). p. 5987–95. doi: 10.1109/CVPR.2017.634

CrossRef Full Text | Google Scholar

32. Loshchilov I, Hutter F editors. SGDR: stochastic gradient descent with warm restarts. In: ICLR. Toulon (2017).

Google Scholar

33. Lin T-Y, Goyal P, Girshick RB, He K, Dollár P. Focal loss for dense object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV). Venice (2017). p. 2999–3007. doi: 10.1109/ICCV.2017.324

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. (1934) 26:404–13. doi: 10.1093/biomet/26.4.404

CrossRef Full Text | Google Scholar

35. DeLong ER, Vernon WB, Bollinger RR. Sensitivity and specificity of a monitoring test. Biometrics. (1985) 41:947–58. doi: 10.2307/2530966

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Watson PF, Petrie A. Method agreement analysis: a review of correct methodology. Theriogenology. (2010) 73:1167–79. doi: 10.1016/j.theriogenology.2010.01.003

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Matthews L, Marasco R, Jenkinson M, Kuker W, Luppe S, Leite MI, et al. Distinction of seropositive NMO spectrum disorder and MS brain lesion distribution. Neurology. (2013) 80:1330–7. doi: 10.1212/WNL.0b013e3182887957

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Jurynczyk M, Tackley G, Kong Y, Geraldes R, Matthews L, Woodhall M, et al. Brain lesion distribution criteria distinguish MS from AQP4-antibody NMOSD and MOG-antibody disease. J Neurol Neurosurg Psychiatry. (2017) 88:132–6. doi: 10.1136/jnnp-2016-314005

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Yoo Y, Tang LYW, Kim S-H, Kim HJ, Lee LE, Li DKB, et al editors. Hierarchical Multimodal Fusion of Deep-Learned Lesion and Tissue Integrity Features in Brain MRIs for Distinguishing Neuromyelitis Optica from Multiple Sclerosis. Cham: Springer International Publishing (2017). doi: 10.1007/978-3-319-66179-7_55

CrossRef Full Text | Google Scholar

40. Shen J, Zhang CJP, Jiang B, Chen J, Song J, Liu Z, et al. Artificial intelligence vs. clinicians in disease diagnosis: systematic review. JMIR Med Inform. (2019) 7:e10010. doi: 10.2196/10010

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, et al. Artificial intelligence vs. clinicians: systematic review of design, reporting standards, and claims of deep learning studies. Bmj. (2020) 368:m689. doi: 10.1136/bmj.m689

CrossRef Full Text | Google Scholar

42. Kim SH, Mealy MA, Levy M, Schmidt F, Ruprecht K, Paul F, et al. Racial differences in neuromyelitis optica spectrum disorder. Neurology. (2018) 91:e2089-e99. doi: 10.1212/WNL.0000000000006574

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Jarius S, Paul F, Franciotta D, Ruprecht K, Ringelstein M, Bergamaschi R, et al. Cerebrospinal fluid findings in aquaporin-4 antibody positive neuromyelitis optica: results from 211 lumbar punctures. J Neurol Sci. (2011) 306:82–90. doi: 10.1016/j.jns.2011.03.038

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: brain magnetic resonance image (MRI), convolutional neural network (CNN), deep learning, multiple sclerosis, neuromyelitis optica spectrum disorder

Citation: Kim H, Lee Y, Kim Y-H, Lim Y-M, Lee JS, Woo J, Jang S-K, Oh YJ, Kim HW, Lee E-J, Kang D-W and Kim K-K (2020) Deep Learning-Based Method to Differentiate Neuromyelitis Optica Spectrum Disorder From Multiple Sclerosis. Front. Neurol. 11:599042. doi: 10.3389/fneur.2020.599042

Received: 26 August 2020; Accepted: 12 November 2020;
Published: 30 November 2020.

Edited by:

Fabienne Brilot, The University of Sydney, Australia

Reviewed by:

Friedemann Paul, Charité – Universitätsmedizin Berlin, Germany
Yang Zheng, Zhejiang University, China

Copyright © 2020 Kim, Lee, Kim, Lim, Lee, Woo, Jang, Oh, Kim, Lee, Kang and Kim. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Dong-Wha Kang, ZHdrYW5nQGFtYy5zZW91bC5rcg==; Kwang-Kuk Kim, a2traW1AYW1jLnNlb3VsLmty

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Deep Learning-Based Method to Differentiate Neuromyelitis Optica Spectrum Disorder From Multiple Sclerosis

Introduction

Materials and Methods

Participants

MRI Acquisition

NMOSD and MS Classification by Neurologists

NMOSD and MS Classification via Deep Learning-Based Model

Subdivision of the Dataset Into Training, Validation, and Test Sets

Scaling Clinical Information

Image Pre-processing

Model Architecture

Modeling

Statistical Analysis

Results

Baseline Characteristics of the Participants

Diagnostic Performance

Discussion

Data Availability Statement

Ethics Statement

Author Contributions

Funding

Conflict of Interest

Acknowledgments

References

94% of researchers rate our articles as excellent or good

94% of researchers rate our articles as excellent or good