Skip to main content

ORIGINAL RESEARCH article

Front. Microbiol.
Sec. Food Microbiology
Volume 15 - 2024 | doi: 10.3389/fmicb.2024.1393824

Comparison of three source attribution methods applied to whole genome sequencing data of monophasic and biphasic Salmonella Typhimurium isolates from the British Isles and from Denmark

Provisionally accepted
  • 1 Animal and Plant Health Agency (United Kingdom), Addlestone, United Kingdom
  • 2 State Serum Institute (SSI), Copenhagen, Hovedstaden, Denmark
  • 3 Technical University of Denmark, Kongens Lyngby, Denmark

The final, formatted version of the article will be published soon.

    Methodologies for source attribution (SA) of foodborne illnesses comprise a rapidly expanding suite of techniques for estimating the most important source(s) of human infection. Recently, the increasing availability of WGS data for a wide range of bacterial strains has led to the development of novel SA methods. These techniques utilize the unique features of bacterial genomes adapted to different host types and hence offer increased resolution of the outputs. Comparative studies of different SA techniques reliant on WGS data are currently lacking. Here, we critically assessed and compared the outputs of three SA methods: a supervised classification random forest machine learning algorithm (RandomForest), an Accessory genes-Based Source Attribution method (AB_SA), and a Bayesian frequency matching method (Bayesian). Each technique was applied to the WGS data of a panel of 902 reservoir host and human monophasic and biphasic Salmonella Typhimurium isolates sampled 2012-2016 in the British Isles (BI) and Denmark. Additionally, for RandomForest and Bayesian, we explored whether utilization of accessory genome features as model inputs improved attribution accuracy of these methods. Results indicated that this was the case for RandomForest, but for Bayesian the overall attribution estimates varied little regardless of the inclusion or not of the accessory genome features. All three methods attributed most human isolates to the Pigs primary source class, which was expected given the known high relative prevalence rates in pigs, and hence routes of infection into the human population, of monophasic and biphasic S. Typhimurium in the BI and Denmark. The accuracy of AB_SA was lower than of RandomForest when attributing the primary source classes to the 120 animal test set isolates with known primary sources. A major advantage of both AB_SA and Bayesian was a much faster execution time as compared to RandomForest. Overall, the SA method comparison presented in this study describes the strengths and weaknesses of each of the three methods applied to attributing potential monophasic and biphasic S. Typhimurium animal sources to human infections that could be valuable when deciding which SA methodology would be the most applicable to foodborne disease outbreak scenarios involving monophasic and biphasic S. Typhimurium.

    Keywords: source attribution, monophasic and biphasic Salmonella Typhimurium, machine learning, random forest, bayesian modelling, accessory genes-based source attribution, Bacterial Genomics

    Received: 29 Feb 2024; Accepted: 29 Aug 2024.

    Copyright: © 2024 Guzinski, Arnold, Whiteley, Tang, Patel, Trew, Litrup, Hald, Smith and Petrovska. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Jaromir Guzinski, Animal and Plant Health Agency (United Kingdom), Addlestone, United Kingdom
    Liljana Petrovska, Animal and Plant Health Agency (United Kingdom), Addlestone, United Kingdom

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.