AUTHOR=Zhao Jianhua , Lui Harvey , Kalia Sunil , Lee Tim K. , Zeng Haishan TITLE=Improving skin cancer detection by Raman spectroscopy using convolutional neural networks and data augmentation JOURNAL=Frontiers in Oncology VOLUME=14 YEAR=2024 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2024.1320220 DOI=10.3389/fonc.2024.1320220 ISSN=2234-943X ABSTRACT=Background

Our previous studies have demonstrated that Raman spectroscopy could be used for skin cancer detection with good sensitivity and specificity. The objective of this study is to determine if skin cancer detection can be further improved by combining deep neural networks and Raman spectroscopy.

Patients and methods

Raman spectra of 731 skin lesions were included in this study, containing 340 cancerous and precancerous lesions (melanoma, basal cell carcinoma, squamous cell carcinoma and actinic keratosis) and 391 benign lesions (melanocytic nevus and seborrheic keratosis). One-dimensional convolutional neural networks (1D-CNN) were developed for Raman spectral classification. The stratified samples were divided randomly into training (70%), validation (10%) and test set (20%), and were repeated 56 times using parallel computing. Different data augmentation strategies were implemented for the training dataset, including added random noise, spectral shift, spectral combination and artificially synthesized Raman spectra using one-dimensional generative adversarial networks (1D-GAN). The area under the receiver operating characteristic curve (ROC AUC) was used as a measure of the diagnostic performance. Conventional machine learning approaches, including partial least squares for discriminant analysis (PLS-DA), principal component and linear discriminant analysis (PC-LDA), support vector machine (SVM), and logistic regression (LR) were evaluated for comparison with the same data splitting scheme as the 1D-CNN.

Results

The ROC AUC of the test dataset based on the original training spectra were 0.886±0.022 (1D-CNN), 0.870±0.028 (PLS-DA), 0.875±0.033 (PC-LDA), 0.864±0.027 (SVM), and 0.525±0.045 (LR), which were improved to 0.909±0.021 (1D-CNN), 0.899±0.022 (PLS-DA), 0.895±0.022 (PC-LDA), 0.901±0.020 (SVM), and 0.897±0.021 (LR) respectively after augmentation of the training dataset (p<0.0001, Wilcoxon test). Paired analyses of 1D-CNN with conventional machine learning approaches showed that 1D-CNN had a 1–3% improvement (p<0.001, Wilcoxon test).

Conclusions

Data augmentation not only improved the performance of both deep neural networks and conventional machine learning techniques by 2–4%, but also improved the performance of the models on spectra with higher noise or spectral shifting. Convolutional neural networks slightly outperformed conventional machine learning approaches for skin cancer detection by Raman spectroscopy.