Skip to main content

ORIGINAL RESEARCH article

Front. Vet. Sci.
Sec. Veterinary Epidemiology and Economics
Volume 11 - 2024 | doi: 10.3389/fvets.2024.1358028
This article is part of the Research Topic Reviews in Veterinary Epidemiology and Economics View all 5 articles

Predicting host species susceptibility to influenza viruses and coronaviruses using genome data and machine learning: a scoping review

Provisionally accepted
  • 1 Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Canada
  • 2 Center for Public Health and Zoonoses, Ontario Veterinary College, University of Guelph, Guelph, Ontario, Canada
  • 3 Centre for Advancing Responsible & Ethical Artificial Intelligence, University of Guelph, Guelph, Canada
  • 4 Department of Pathobiology, Ontario Veterinary College, University of Guelph, Ontario, Ontario, Canada
  • 5 Department of Population Health, College of Veterinary Medicine, University of Georgia, Athens, Georgia, United States

The final, formatted version of the article will be published soon.

    Predicting which species are susceptible to viruses (i.e., host range) is important for understanding and developing effective strategies to control viral outbreaks in both humans and animals. The use of machine learning and bioinformatic approaches to predict viral hosts has been expanded with advancements in in-silico techniques. We conducted a scoping review to identify the breadth of machine learning methods applied to influenza and coronavirus genome data for the identification of susceptible host species. The protocol for this scoping review is available at https://hdl.handle.net/10214/26112. Five online databases were searched, and 1217 citations, published between January 2000 and May 2022, were obtained, and screened in duplicate for English language and in-silico research, covering the use of machine learning to identify susceptible species to viruses. Fifty-three relevant publications were identified for data charting. The breadth of research was extensive including 32 different machine learning algorithms used in combination with 29 different feature selection methods and 43 different genome data input formats. There were 20 different methods used by authors to assess accuracy. Authors mostly used influenza viruses (n = 31/53 publications, 58.5%), however, more recent publications focused on coronaviruses and other viruses in combination with influenza viruses (n = 22/53, 41.5%). The susceptible animal groups authors most used were humans (n = 57/77 analyses, 74.0%), avian (n = 35/77 45.4%), and swine (n = 28/77, 36.4%). In total, 53 different hosts were used and, in most publications, data from multiple hosts was used. The main gaps in research were a lack of standardized reporting of methodology and the use of broad host categories for classification. Overall, approaches to viral host identification using machine learning were diverse and extensive.

    Keywords: influenza A viruses, Coronaviruses, machine-learning, Genome, Scoping review, Interspecies transmission, spillover

    Received: 19 Dec 2023; Accepted: 28 Aug 2024.

    Copyright: © 2024 Alberts, Berke, Rocha, Keay, Maboni and Poljak. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Famke Alberts, Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Canada

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.