AUTHOR=Méndez-Cruz Carlos-Francisco , Rodríguez-Herrera Joel , Varela-Vega Alfredo , Mateo-Estrada Valeria , Castillo-Ramírez Santiago TITLE=Unsupervised learning and natural language processing highlight research trends in a superbug JOURNAL=Frontiers in Artificial Intelligence VOLUME=7 YEAR=2024 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1336071 DOI=10.3389/frai.2024.1336071 ISSN=2624-8212 ABSTRACT=Introduction

Antibiotic-resistant Acinetobacter baumannii is a very important nosocomial pathogen worldwide. Thousands of studies have been conducted about this pathogen. However, there has not been any attempt to use all this information to highlight the research trends concerning this pathogen.

Methods

Here we use unsupervised learning and natural language processing (NLP), two areas of Artificial Intelligence, to analyse the most extensive database of articles created (5,500+ articles, from 851 different journals, published over 3 decades).

Results

K-means clustering found 113 theme clusters and these were defined with representative terms automatically obtained with topic modelling, summarising different research areas. The biggest clusters, all with over 100 articles, are biased toward multidrug resistance, carbapenem resistance, clinical treatment, and nosocomial infections. However, we also found that some research areas, such as ecology and non-human infections, have received very little attention. This approach allowed us to study research themes over time unveiling those of recent interest, such as the use of Cefiderocol (a recently approved antibiotic) against A. baumannii.

Discussion

In a broader context, our results show that unsupervised learning, NLP and topic modelling can be used to describe and analyse the research themes for important infectious diseases. This strategy should be very useful to analyse other ESKAPE pathogens or any other pathogens relevant to Public Health.