Skip to main content

ORIGINAL RESEARCH article

Front. Microbiomes
Sec. Host and Microbe Associations
Volume 3 - 2024 | doi: 10.3389/frmbi.2024.1408203

Machine learning models can identify individuals based on a resident oral bacteriophage family

Provisionally accepted
  • 1 Stanford University, Stanford, United States
  • 2 Genentech Inc., San Francisco, California, United States
  • 3 University of California, San Francisco, San Francisco, California, United States
  • 4 WellStar Kennestone Hospital, Marietta, Georgia, United States
  • 5 University of Southern California, Los Angeles, California, United States
  • 6 Translationale Onkologie an der Universitätsmedizin der Johannes Gutenberg-Universität Mainz, Mainz, Rhineland-Palatinate, Germany
  • 7 California Institute of Technology, Pasadena, California, United States

The final, formatted version of the article will be published soon.

    Metagenomic studies have revolutionized the study of novel phages. However these studies trade depth of coverage for breadth. We show that the targeted sequencing of a small region of a phage terminase family can provide sufficient sequence diversity to serve as an individual-specific barcode or a "phageprint'', defined as the relative abundance profile of the variants within a terminase family. By collecting ~700 oral samples from ~100 individuals living on multiple continents, we found a consistent trend wherein each individual harbors one or two dominant variants that coexist with numerous low-abundance variants. By tracking phageprints over the span of a month across ten individuals, we observed that phageprints were generally stable, and found instances of concordant temporal fluctuations of variants shared between partners. To quantify these patterns further, we built machine learning models that, with high precision and recall, distinguished individuals even when we eliminated the most abundant variants and further downsampled phageprints to 2% of the remaining variants. Except between partners, phageprints are dissimilar between individuals, and neither country-of-residence, genetics, diet nor cohabitation seem to play a role in the relatedness of phageprints across individuals. By sampling from six different oral sites, we were able to study the impact of millimeters to a few centimeters of separation on an individual's phageprint and found that such limited spatial separation results in site-specific phageprints.

    Keywords: Human Identification, forensics, Virology, phages, machine learning

    Received: 27 Mar 2024; Accepted: 17 Jul 2024.

    Copyright: © 2024 Mahmoudabadi, Homyk, Catching, Mahmoudabadi, Foley, Tadmor and Phillips. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Gita Mahmoudabadi, Stanford University, Stanford, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.