The final, formatted version of the article will be published soon.
TECHNOLOGY AND CODE article
Front. Bioinform.
Sec. Integrative Bioinformatics
Volume 4 - 2024 |
doi: 10.3389/fbinf.2024.1483255
PRONAME: a user-friendly pipeline to process long-read Nanopore metabarcoding data by generating high-quality consensus sequences
Provisionally accepted- 1 Walloon Agricultural Research Centre, Gembloux, Belgium
- 2 UCLouvain, Louvain-la-Neuve, Belgium
The study of sample taxonomic composition has evolved from direct observations and labor-intensive morphological studies to different DNA sequencing methodologies. Most of them these studies leverages the metabarcoding approach, which involves the amplification of a small taxonomically-informative portion of the genome and its subsequent high-throughput sequencing. RThe recent advances in sequencing technology brought by Oxford Nanopore Technologies have revolutionized the field, allowing enabling portability, affordable cost and long-read sequencing, thus therefore leading to a significant increase in the taxonomic resolution. However, Nanopore sequencing data exhibit a particular profile, with a higher error rate compared with Illumina sequencing, and existing bioinformatics pipelines for the analysis of such data are scarce and often insufficient, requiring specialized tools to accurately process long-read sequences.We present PRONAME (PROcessing NAnopore MEtabarcoding data), an open-source, userfriendly pipeline optimized for processing raw Nanopore sequencing data. PRONAME includes precompiled databases for complete 16S sequences (Silva138 and Greengenes2) and a newly developed and curated database dedicated to bacterial 16S-ITS-23S operon sequences. The user can also provide a custom database if desired, thus therefore allowing enabling the analysis of metabarcoding data for any domain of life. The pipeline significantly improves sequence accuracy, implementing innovative error-correction strategies and taking advantage of the new sequencing chemistry to produce high-quality duplex reads. Evaluations using a mock community have shown that PRONAME delivers consensus sequences showing demonstrating at least 99.5% accuracy with standard settings (and up to 99.7%), making it a robust tool for genomic analysis of complex multispecies communities.Conclusions: PRONAME meets the challenges of long-read Nanopore data processing, offering greater accuracy and versatility than existing pipelines. By integrating Nanopore-specific quality filtering, clustering and error correction, PRONAME produces high-precision consensus sequences. This brings the accuracy of Nanopore sequencing close to that of Illumina sequencing, while taking advantage of the benefits of long-read technologies.
Keywords: Long-read high-throughput sequencing, accuracy, clustering, Polishing, duplex reads, microbiome, database, Ribosomal operon
Received: 19 Aug 2024; Accepted: 27 Nov 2024.
Copyright: © 2024 Dubois, Delitte, Lengrand, Bragard, Legrève and Debode. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Benjamin Dubois, Walloon Agricultural Research Centre, Gembloux, Belgium
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.