About this Research Topic
Besides scalability, the third-revolution in sequencing technologies calls for the methodological re-thinking of many classic sequencing-related problems (e.g. mapping, assembly, and error correction) due to the different nature of sequencing errors on longer reads. But even more computational challenges are brought into play by the “conceptual shift” from the single-genome analysis paradigm to the many-genomes paradigm that the huge amount of available sequencing projects now allows. Moreover, depending on what we mean by “many”, we can intercept different biological contexts, together with their associated computational problems: pangenomics, where several genomes from the same species are considered in the analysis; and metagenomics, where the genomes of several different species are present in the same sample.
The algorithmic answer to these challenges has just begun, starting with the proposal of graph-based representations and the exploitation of compressed indexes and learning algorithms. However, there are still many open computational challenges to be addressed to enhance the many-genomes paradigm with tailored processing capabilities to tackle specific biological questions.
The goal of this Research Topic is to generate a collection of high-quality papers describing a next-generation of practical and scalable bioinformatics tools that are specifically designed to index and process several genomes at once, exploiting state-of-the-art research on compressed data structures, graph representations of genomes, combinatorial pattern matching, sketching techniques, artificial intelligence and machine learning.
The advantages of designing and exploiting efficient data representations, together with tailored algorithms for multiple genome analysis, will bring together not only a more rational use of computational resources (and associated running costs), but will also allow researchers to infer more information from the comparative analysis of larger datasets, thus leading to higher quality results on associated biological analyses.
We are interested in submissions describing novel algorithms, data structures, and tools for processing genomic datasets that represent more than one genome, with application to pangenome or metagenome analysis. Techniques developed for either short or long reads (or combination of both) are welcome. Besides the description of the methodological aspects, submissions should include experiments showing the practical applicability of the proposed methods. A limited number of papers presenting a perspective or reviews including comparative analysis of state-of-the-art tools are also welcome.
Topics include, but are not limited to computational approaches related to:
• Pangenomes representations
• Genome graph applications: variant calling, alignment, visualization, etc.
• Metagenomes binning, taxonomic classification and assembly
• Metagenomics samples comparison
Keywords: bioinformatics algorithms, genomes representation, sequences and graphs comparison, analysis of read datasets, clustering, classification, alignment-free, assembly-free, compressed indexes, sketching
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.