AUTHOR=Hekler Achim , Kather Jakob N. , Krieghoff-Henning Eva , Utikal Jochen S. , Meier Friedegund , Gellrich Frank F. , Upmeier zu Belzen Julius , French Lars , Schlager Justin G. , Ghoreschi Kamran , Wilhelm Tabea , Kutzner Heinz , Berking Carola , Heppt Markus V. , Haferkamp Sebastian , Sondermann Wiebke , Schadendorf Dirk , Schilling Bastian , Izar Benjamin , Maron Roman , Schmitt Max , Fröhling Stefan , Lipka Daniel B. , Brinker Titus J. TITLE=Effects of Label Noise on Deep Learning-Based Skin Cancer Classification JOURNAL=Frontiers in Medicine VOLUME=7 YEAR=2020 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2020.00177 DOI=10.3389/fmed.2020.00177 ISSN=2296-858X ABSTRACT=
Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%,