Accurate severity grading of lumbar spine disease by magnetic resonance images (MRIs) plays an important role in selecting appropriate treatment for the disease. However, interpreting these complex MRIs is a repetitive and time-consuming workload for clinicians, especially radiologists. Here, we aim to develop a multi-task classification model based on artificial intelligence for automated grading of lumbar disc herniation (LDH), lumbar central canal stenosis (LCCS) and lumbar nerve roots compression (LNRC) at lumbar axial MRIs.
Total 15254 lumbar axial T2W MRIs as the internal dataset obtained from the Fifth Affiliated Hospital of Sun Yat-sen University from January 2015 to May 2019 and 1273 axial T2W MRIs as the external test dataset obtained from the Third Affiliated Hospital of Southern Medical University from June 2016 to December 2017 were analyzed in this retrospective study. Two clinicians annotated and graded all MRIs using the three international classification systems. In agreement, these results served as the reference standard; In disagreement, outcomes were adjudicated by an expert surgeon to establish the reference standard. The internal dataset was randomly split into an internal training set (70%), validation set (15%) and test set (15%). The multi-task classification model based on ResNet-50 consists of a backbone network for feature extraction and three fully-connected (FC) networks for classification and performs the classification tasks of LDH, LCCS, and LNRC at lumbar MRIs. Precision, accuracy, sensitivity, specificity, F1 scores, confusion matrices, receiver-operating characteristics and interrater agreement (Gwet k) were utilized to assess the model’s performance on the internal test dataset and external test datasets.
A total of 1115 patients, including 1015 patients from the internal dataset and 100 patients from the external test dataset [mean age, 49 years ± 15 (standard deviation); 543 women], were evaluated in this study. The overall accuracies of grading for LDH, LCCS and LNRC were 84.17% (74.16%), 86.99% (79.65%) and 81.21% (74.16%) respectively on the internal (external) test dataset. Internal and external testing of three spinal diseases showed substantial to the almost perfect agreement (k, 0.67 - 0.85) for the multi-task classification model.
The multi-task classification model has achieved promising performance in the automated grading of LDH, LCCS and LNRC at lumbar axial T2W MRIs.