AUTHOR=Egenriether Sada , Sanford Robert , Yang Wendy H. , Kent Angela D. TITLE=Nitrogen Cycling Microbial Diversity and Operational Taxonomic Unit Clustering: When to Prioritize Accuracy Over Speed JOURNAL=Frontiers in Microbiology VOLUME=13 YEAR=2022 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2022.730340 DOI=10.3389/fmicb.2022.730340 ISSN=1664-302X ABSTRACT=Background

Assessments of the soil microbiome provide valuable insight to ecosystem function due to the integral role microorganisms play in biogeochemical cycling of carbon and nutrients. For example, treatment effects on nitrogen cycling functional groups are often presented alongside one another to demonstrate how agricultural management practices affect various nitrogen cycling processes. However, the functional groups commonly evaluated in nitrogen cycling microbiome studies range from phylogenetically narrow (e.g., N-fixation, nitrification) to broad [e.g., denitrification, dissimilatory nitrate reduction to ammonium (DNRA)]. The bioinformatics methods used in such studies were developed for 16S rRNA gene sequence data, and how these tools perform across functional genes of different phylogenetic diversity has not been established. For example, an OTU clustering method that can accurately characterize sequences harboring comparatively little diversity may not accurately resolve the diversity within a gene comprised of a large number of clades. This study uses two nitrogen cycling genes, nifH, a gene which segregates into only three distinct clades, and nrfA, a gene which is comprised of at least eighteen clades, to investigate differences which may arise when using heuristic OTU clustering (abundance-based greedy clustering, AGC) vs. true hierarchical OTU clustering (Matthews Correlation Coefficient optimizing algorithm, Opti-MCC). Detection of treatment differences for each gene were evaluated to demonstrate how conclusions drawn from a given dataset may differ depending on clustering method used.

Results

The heuristic and hierarchical methods performed comparably for the more conserved gene, nifH. The hierarchical method outperformed the heuristic method for the more diverse gene, nrfA; this included both the ability to detect treatment differences using PERMANOVA, as well as higher resolution in taxonomic classification. The difference in performance between the two methods may be traced to the AGC method’s preferential assignment of sequences to the most abundant OTUs: when analysis was limited to only the largest 100 OTUs, results from the AGC-assembled OTU table more closely resembled those of the Opti-MCC OTU table. Additionally, both AGC and Opti-MCC OTU tables detected comparable treatment differences using the rank-based ANOSIM test. This demonstrates that treatment differences were preserved using both clustering methods but were structured differently within the OTU tables produced using each method.

Conclusion

For questions which can be answered using tests agnostic to clustering method (e.g., ANOSIM), or for genes of relatively low phylogenetic diversity (e.g., nifH), most upstream processing methods should lead to similar conclusions from downstream analyses. For studies involving more diverse genes, however, care should be exercised to choose methods that ensure accurate clustering for all genes. This will mitigate the risk of introducing Type II errors by allowing for detection of comparable treatment differences for all genes assessed, rather than disproportionately detecting treatment differences in only low-diversity genes.