AUTHOR=Tang Jia-Wei , Liu Qing-Hua , Yin Xiao-Cong , Pan Ya-Cheng , Wen Peng-Bo , Liu Xin , Kang Xing-Xing , Gu Bing , Zhu Zuo-Bin , Wang Liang TITLE=Comparative Analysis of Machine Learning Algorithms on Surface Enhanced Raman Spectra of Clinical Staphylococcus Species JOURNAL=Frontiers in Microbiology VOLUME=12 YEAR=2021 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2021.696921 DOI=10.3389/fmicb.2021.696921 ISSN=1664-302X ABSTRACT=

Raman spectroscopy (RS) is a widely used analytical technique based on the detection of molecular vibrations in a defined system, which generates Raman spectra that contain unique and highly resolved fingerprints of the system. However, the low intensity of normal Raman scattering effect greatly hinders its application. Recently, the newly emerged surface enhanced Raman spectroscopy (SERS) technique overcomes the problem by mixing metal nanoparticles such as gold and silver with samples, which greatly enhances signal intensity of Raman effects by orders of magnitudes when compared with regular RS. In clinical and research laboratories, SERS provides a great potential for fast, sensitive, label-free, and non-destructive microbial detection and identification with the assistance of appropriate machine learning (ML) algorithms. However, choosing an appropriate algorithm for a specific group of bacterial species remains challenging, because with the large volumes of data generated during SERS analysis not all algorithms could achieve a relatively high accuracy. In this study, we compared three unsupervised machine learning methods and 10 supervised machine learning methods, respectively, on 2,752 SERS spectra from 117 Staphylococcus strains belonging to nine clinically important Staphylococcus species in order to test the capacity of different machine learning methods for bacterial rapid differentiation and accurate prediction. According to the results, density-based spatial clustering of applications with noise (DBSCAN) showed the best clustering capacity (Rand index 0.9733) while convolutional neural network (CNN) topped all other supervised machine learning methods as the best model for predicting Staphylococcus species via SERS spectra (ACC 98.21%, AUC 99.93%). Taken together, this study shows that machine learning methods are capable of distinguishing closely related Staphylococcus species and therefore have great application potentials for bacterial pathogen diagnosis in clinical settings.