AUTHOR=DiMucci Demetrius , Kon Mark , Segrè Daniel TITLE=BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes JOURNAL=Frontiers in Molecular Biosciences VOLUME=8 YEAR=2021 URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2021.663532 DOI=10.3389/fmolb.2021.663532 ISSN=2296-889X ABSTRACT=

Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.