AUTHOR=Kaya Deniz Ece , Ülgen Ege , Kocagöz Ayşe Sesin , Sezerman Osman Uğur TITLE=A comparison of various feature extraction and machine learning methods for antimicrobial resistance prediction in streptococcus pneumoniae JOURNAL=Frontiers in Antibiotics VOLUME=2 YEAR=2023 URL=https://www.frontiersin.org/journals/antibiotics/articles/10.3389/frabi.2023.1126468 DOI=10.3389/frabi.2023.1126468 ISSN=2813-2467 ABSTRACT=
Streptococcus pneumoniae is one of the major concerns of clinicians and one of the global public health problems. This pathogen is associated with high morbidity and mortality rates and antimicrobial resistance (AMR). In the last few years, reduced genome sequencing costs have made it possible to explore more of the drug resistance of S. pneumoniae, and machine learning (ML) has become a popular tool for understanding, diagnosing, treating, and predicting these phenotypes. Nucleotide k-mers, amino acid k-mers, single nucleotide polymorphisms (SNPs), and combinations of these features have rich genetic information in whole-genome sequencing. This study compares different ML models for predicting AMR phenotype for S. pneumoniae. We compared nucleotide k-mers, amino acid k-mers, SNPs, and their combinations to predict AMR in S. pneumoniae for three antibiotics: Penicillin, Erythromycin, and Tetracycline. 980 pneumococcal strains were downloaded from the European Nucleotide Archive (ENA). Furthermore, we used and compared several machine learning methods to train the models, including random forests, support vector machines, stochastic gradient boosting, and extreme gradient boosting. In this study, we found that key features of the AMR prediction model setup and the choice of machine learning method affected the results. The approach can be applied here to further studies to improve AMR prediction accuracy and efficiency.