AUTHOR=Lerch Luc , Huber Lukas S. , Kamath Amith , Pöllinger Alexander , Pahud de Mortanges Aurélie , Obmann Verena C. , Dammann Florian , Senn Walter , Reyes Mauricio TITLE=DreamOn: a data augmentation strategy to narrow the robustness gap between expert radiologists and deep learning classifiers JOURNAL=Frontiers in Radiology VOLUME=4 YEAR=2024 URL=https://www.frontiersin.org/journals/radiology/articles/10.3389/fradi.2024.1420545 DOI=10.3389/fradi.2024.1420545 ISSN=2673-8740 ABSTRACT=Purpose

Successful performance of deep learning models for medical image analysis is highly dependent on the quality of the images being analysed. Factors like differences in imaging equipment and calibration, as well as patient-specific factors such as movements or biological variability (e.g., tissue density), lead to a large variability in the quality of obtained medical images. Consequently, robustness against the presence of noise is a crucial factor for the application of deep learning models in clinical contexts.

Materials and methods

We evaluate the effect of various data augmentation strategies on the robustness of a ResNet-18 trained to classify breast ultrasound images and benchmark the performance against trained human radiologists. Additionally, we introduce DreamOn, a novel, biologically inspired data augmentation strategy for medical image analysis. DreamOn is based on a conditional generative adversarial network (GAN) to generate REM-dream-inspired interpolations of training images.

Results

We find that while available data augmentation approaches substantially improve robustness compared to models trained without any data augmentation, radiologists outperform models on noisy images. Using DreamOn data augmentation, we obtain a substantial improvement in robustness in the high noise regime.

Conclusions

We show that REM-dream-inspired conditional GAN-based data augmentation is a promising approach to improving deep learning model robustness against noise perturbations in medical imaging. Additionally, we highlight a gap in robustness between deep learning models and human experts, emphasizing the imperative for ongoing developments in AI to match human diagnostic expertise.