AUTHOR=Ma Zhanshan (Sam) TITLE=Estimating the Optimum Coverage and Quality of Amplicon Sequencing With Taylor’s Power Law Extensions JOURNAL=Frontiers in Bioengineering and Biotechnology VOLUME=8 YEAR=2020 URL=https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2020.00372 DOI=10.3389/fbioe.2020.00372 ISSN=2296-4185 ABSTRACT=

Theoretical analysis of DNA sequencing coverage problem has been investigated with complex mathematical models such as Lander–Waterman expectation theory and Stevens’ theorem for randomly covering a domain. In the field of metagenomics sequencing, several approaches have been developed to estimate the coverage of whole-genome shotgun sequencing, but surprisingly few studies addressed the coverage problem for marker-gene amplicon sequencing, for which arguably the biggest challenge is the complexity or heterogeneity of microbial communities. Overall, much of the practice still relies variously on speculation, semi-empirical and ad hoc heuristic models. Conservatively raising coverage may ensure the success of sequencing project, but often with unduly cost. In this study, we borrow the principles and approaches of optimum sampling methodology originated in applied entomology, achieved equal success in plant pathology and parasitology, and plays a critical role in the decision-making for global crop and forest protection against economic pests since 1970s when the pesticide crisis and food safety concerns forced the reduction of pesticide usages, which in turn requires reliable sampling techniques for monitoring pest populations. We realized that sequencing coverage is essentially an optimum sampling problem. Perhaps the only essential difference between sampling insects and sampling microbiome is the “instrument” used. In traditional entomology, it is usually humans that visually count the numbers of insects, occasionally aided by binocular microscope. In the metagenomics research, it is the DNA sequencers that count the number of DNA reads. Furthermore, a key theoretical foundation for sampling insect pest populations, i.e., Taylor’s power law, which achieved rare status of ecological law and captures the population aggregation, has been recently extended to the community level for describing community heterogeneity and stability, namely, Taylor’s power law extensions (TPLEs). This theoretical advance enabled us to develop a novel approach to assessing the quality and determining optimum reads (coverage) of amplicon sequencing operations. Specifically, two applications were developed: one is, in hindsight, to assess the quality of amplicon sequencing operation in terms of the precision and confidence levels. Another is, prior to sequencing operation, to determine the minimum sequencing efforts for a sequencing project to achieve preset precision and confidence levels.