Skip to main content

TECHNOLOGY AND CODE article

Front. Mol. Biosci.
Sec. Molecular Evolution
Volume 11 - 2024 | doi: 10.3389/fmolb.2024.1432495
This article is part of the Research Topic Insights in Molecular Evolution: 2023 View all articles

Spectral Cluster Supertree: Fast and Statistically Robust Merging of Rooted Phylogenetic Trees

Provisionally accepted
  • 1 Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
  • 2 School of Computing, Australian National University, Canberra, ACT, Australia
  • 3 School of Natural Sciences, University of Tasmania, Hobart, Tasmania, Australia

The final, formatted version of the article will be published soon.

    The algorithms for phylogenetic reconstruction are central to computational molecular evolution.The relentless pace of data acquisition has exposed their poor scalability and the conclusion that the conventional application of these methods is impractical and not justifiable from an energy usage perspective. Furthermore, the drive to improve the statistical performance of phylogenetic methods produces increasingly parameter-rich models of sequence evolution, which worsens the computational performance. Established theoretical and algorithmic results identify supertree methods as critical to divide-and-conquer strategies for improving scalability of phylogenetic reconstruction. Of particular importance is the ability to explicitly accommodate rooted topologies.These can arise from the more biologically plausible non-stationary models of sequence evolution.We make a contribution to addressing this challenge with Spectral Cluster Supertree, a novel supertree method for merging a set of overlapping rooted phylogenetic trees. It offers significant improvements over Min-Cut supertree and previous state-of-the-art methods in terms of both time complexity and overall topological accuracy, particularly for problems of large size. We perform comparisons against Min-Cut supertree and Bad Clade Deletion. Leveraging two tree topology distance metrics, we demonstrate that while Bad Clade Deletion generates more correct clades in its resulting supertree, Spectral Cluster Supertree's generated tree is generally more topologically close to the true model tree. Over large datasets containing 10000 taxa and 500 source trees, where Bad Clade Deletion usually takes 2 hours to run, our method generates a supertree in on average 20 seconds. Spectral Cluster Supertree is released under an open source license and is available on the python package index as sc-supertree.

    Keywords: Supertree, spectral clustering, Rooted Phylogenetic Trees, phylogenetics, molecular evolution

    Received: 14 May 2024; Accepted: 24 Sep 2024.

    Copyright: © 2024 Mcarthur, Zehmakan, Charleston and Huttley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Robert N. Mcarthur, Ecology and Evolution, Research School of Biology, Australian National University, Canberra, 0200, ACT, Australia
    Gavin Huttley, Ecology and Evolution, Research School of Biology, Australian National University, Canberra, 0200, ACT, Australia

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.