From one genome to many genomes: the evolution of computational approaches for pangenomics and metagenomics analysis

15.7K

views

25

authors

6

articles

From one genome to many genomes: the evolution of computational approaches for pangenomics and metagenomics analysis

15.7K

views

25

authors

6

articles

Editors

3

University of Helsinki

Paola Bonizzoni

University of Milano-Bicocca

University of Padua

Impact

About

In the last few decades, the evolution of sequencing technologies has revolutionized genome analysis, pushing towards the design of scalable computational approaches for the analysis of massive datasets produced by both popular short read sequencers, and more recent long read technologies.

Besides scalability, the third-revolution in sequencing technologies calls for the methodological re-thinking of many classic sequencing-related problems (e.g. mapping, assembly, and error correction) due to the different nature of sequencing errors on longer reads. But even more computational challenges are brought into play by the “conceptual shift” from the single-genome analysis paradigm to the many-genomes paradigm that the huge amount of available sequencing projects now allows. Moreover, depending on what we mean by “many”, we can intercept different biological contexts, together with their associated computational problems: pangenomics, where several genomes from the same species are considered in the analysis; and metagenomics, where the genomes of several different species are present in the same sample.

The algorithmic answer to these challenges has just begun, starting with the proposal of graph-based representations and the exploitation of compressed indexes and learning algorithms. However, there are still many open computational challenges to be addressed to enhance the many-genomes paradigm with tailored processing capabilities to tackle specific biological questions.

The goal of this Research Topic is to generate a collection of high-quality papers describing a next-generation of practical and scalable bioinformatics tools that are specifically designed to index and process several genomes at once, exploiting state-of-the-art research on compressed data structures, graph representations of genomes, combinatorial pattern matching, sketching techniques, artificial intelligence and machine learning.

The advantages of designing and exploiting efficient data representations, together with tailored algorithms for multiple genome analysis, will bring together not only a more rational use of computational resources (and associated running costs), but will also allow researchers to infer more information from the comparative analysis of larger datasets, thus leading to higher quality results on associated biological analyses.

We are interested in submissions describing novel algorithms, data structures, and tools for processing genomic datasets that represent more than one genome, with application to pangenome or metagenome analysis. Techniques developed for either short or long reads (or combination of both) are welcome. Besides the description of the methodological aspects, submissions should include experiments showing the practical applicability of the proposed methods. A limited number of papers presenting a perspective or reviews including comparative analysis of state-of-the-art tools are also welcome.

Topics include, but are not limited to computational approaches related to:
• Pangenomes representations
• Genome graph applications: variant calling, alignment, visualization, etc.
• Metagenomes binning, taxonomic classification and assembly
• Metagenomics samples comparison

In the last few decades, the evolution of sequencing technologies has revolutionized genome analysis, pushing towards the design of scalable computational approaches for the analysis of massive datasets produced by both popular short read sequencers, and more recent long read technologies.

Besides scalability, the third-revolution in sequencing technologies calls for the methodological re-thinking of many classic sequencing-related problems (e.g. mapping, assembly, and error correction) due to the different nature of sequencing errors on longer reads. But even more computational challenges are brought into play by the “conceptual shift” from the single-genome analysis paradigm to the many-genomes paradigm that the huge amount of available sequencing projects now allows. Moreover, depending on what we mean by “many”, we can intercept different biological contexts, together with their associated computational problems: pangenomics, where several genomes from the same species are considered in the analysis; and metagenomics, where the genomes of several different species are present in the same sample.

The algorithmic answer to these challenges has just begun, starting with the proposal of graph-based representations and the exploitation of compressed indexes and learning algorithms. However, there are still many open computational challenges to be addressed to enhance the many-genomes paradigm with tailored processing capabilities to tackle specific biological questions.

The goal of this Research Topic is to generate a collection of high-quality papers describing a next-generation of practical and scalable bioinformatics tools that are specifically designed to index and process several genomes at once, exploiting state-of-the-art research on compressed data structures, graph representations of genomes, combinatorial pattern matching, sketching techniques, artificial intelligence and machine learning.

The advantages of designing and exploiting efficient data representations, together with tailored algorithms for multiple genome analysis, will bring together not only a more rational use of computational resources (and associated running costs), but will also allow researchers to infer more information from the comparative analysis of larger datasets, thus leading to higher quality results on associated biological analyses.

We are interested in submissions describing novel algorithms, data structures, and tools for processing genomic datasets that represent more than one genome, with application to pangenome or metagenome analysis. Techniques developed for either short or long reads (or combination of both) are welcome. Besides the description of the methodological aspects, submissions should include experiments showing the practical applicability of the proposed methods. A limited number of papers presenting a perspective or reviews including comparative analysis of state-of-the-art tools are also welcome.

Topics include, but are not limited to computational approaches related to:
• Pangenomes representations
• Genome graph applications: variant calling, alignment, visualization, etc.
• Metagenomes binning, taxonomic classification and assembly
• Metagenomics samples comparison

Share

Editors

University of Helsinki

Paola Bonizzoni

University of Milano-Bicocca

University of Padua

Impact

15,681 Total views

11,579 Article views

2,982 Article downloads

1,120 Topic views

Published In

Journal Thumbnail

Frontiers in Bioinformatics

Genomic Analysis

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Suggest a topic