The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Microbiol.
Sec. Infectious Agents and Disease
Volume 16 - 2025 |
doi: 10.3389/fmicb.2025.1549260
This article is part of the Research Topic Bacterial Pathogens and Virulence Factor Genes: Diversity and Evolution View all 15 articles
Bioinformatics combined with machine learning unravels differences among environmental, seafood, and clinical isolates of Vibrio parahaemolyticus
Provisionally accepted- 1 University of Maryland, College Park, College Park, United States
- 2 Human Foods Program, U.S. Food and Drug Administration, College Park, United States
Vibrio parahaemolyticus is the leading cause of illnesses and outbreaks linked to seafood consumption across the globe. Understanding how this pathogen may be adapted to persist along the farm-to-table supply chain has applications for addressing food safety. This study utilized machine learning to develop robust models classifying genomic diversity of V. parahaemolyticus that was isolated from environmental (n=176), seafood (n=975), and clinical (n=865) sample origins. We constructed a pangenome of the respective genome assemblies and employed random forest algorithms to develop predictive models to identify gene clusters encoding metabolism, virulence, and antibiotic resistance that were associated with isolate source type. Comparison of genomes of all seafood-clinical isolates showed high balanced accuracy (≥0.80) and Area Under the Receiver Operating Characteristics curve (≥0.87) for all of these functional features. Major virulence factors including tdh, trh, type III secretion system-related genes, and four alpha-hemolysin genes (hlyA, hlyB, hlyC, and hlyD) were identified as important differentiating factors in our seafood-clinical virulence model, underscoring the need for further investigation. Significant patterns for AMR genes differing among seafood and clinical samples were revealed from our model and genes conferring to tetracycline, elfamycin, and multidrug (phenicol antibiotic, diaminopyrimidine antibiotic, fluoroquinolone antibiotic) resistance were identified as the top three key variables. These findings provide crucial insights into the development of effective surveillance and management strategies to address the public health threats associated with V. parahaemolyticus.
Keywords: Comparative genomics, machine learning, Vibrio parahaemolyticus, Virulence, antibiotic resistance
Received: 20 Dec 2024; Accepted: 03 Feb 2025.
Copyright: © 2025 Feng, Ramachandran, Blaustein and Pradhan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Abani Kumar Pradhan, University of Maryland, College Park, College Park, United States
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.