AUTHOR=Droppelmann Guillermo , Tello Manuel , García Nicolás , Greene Cristóbal , Jorquera Carlos , Feijoo Felipe TITLE=Lateral elbow tendinopathy and artificial intelligence: Binary and multilabel findings detection using machine learning algorithms JOURNAL=Frontiers in Medicine VOLUME=9 YEAR=2022 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2022.945698 DOI=10.3389/fmed.2022.945698 ISSN=2296-858X ABSTRACT=Background

Ultrasound (US) is a valuable technique to detect degenerative findings and intrasubstance tears in lateral elbow tendinopathy (LET). Machine learning methods allow supporting this radiological diagnosis.

Aim

To assess multilabel classification models using machine learning models to detect degenerative findings and intrasubstance tears in US images with LET diagnosis.

Materials and methods

A retrospective study was performed. US images and medical records from patients with LET diagnosis from January 1st, 2017, to December 30th, 2018, were selected. Datasets were built for training and testing models. For image analysis, features extraction, texture characteristics, intensity distribution, pixel-pixel co-occurrence patterns, and scales granularity were implemented. Six different supervised learning models were implemented for binary and multilabel classification. All models were trained to classify four tendon findings (hypoechogenicity, neovascularity, enthesopathy, and intrasubstance tear). Accuracy indicators and their confidence intervals (CI) were obtained for all models following a K-fold-repeated-cross-validation method. To measure multilabel prediction, multilabel accuracy, sensitivity, specificity, and receiver operating characteristic (ROC) with 95% CI were used.

Results

A total of 30,007 US images (4,324 exams, 2,917 patients) were included in the analysis. The RF model presented the highest mean values in the area under the curve (AUC), sensitivity, and also specificity by each degenerative finding in the binary classification. The AUC and sensitivity showed the best performance in intrasubstance tear with 0.991 [95% CI, 099, 0.99], and 0.775 [95% CI, 0.77, 0.77], respectively. Instead, specificity showed upper values in hypoechogenicity with 0.821 [95% CI, 0.82, −0.82]. In the multilabel classifier, RF also presented the highest performance. The accuracy was 0.772 [95% CI, 0.771, 0.773], a great macro of 0.948 [95% CI, 0.94, 0.94], and a micro of 0.962 [95% CI, 0.96, 0.96] AUC scores were detected. Diagnostic accuracy, sensitivity, and specificity with 95% CI were calculated.

Conclusion

Machine learning algorithms based on US images with LET presented high diagnosis accuracy. Mainly the random forest model shows the best performance in binary and multilabel classifiers, particularly for intrasubstance tears.