Comparative Genomics is the field of knowledge dedicated to the analysis and comparison of genes and genomes. The scientific areas comprised in this field include subjects as diverse as (just naming a few): 1) the development of algorithms for the alignment of genes, whole genomes, short- and long sequencing reads, 2) the search for remote sequence similarity, 3) the discovery of motifs and sequence patterns, 4) the identification of gene families, 5) the detection of ortholog/paralog groups, 6) the reconstruction of evolutionary history of the genes, 7) the detection of signs of selective forces exerted over genes and genomes, 8) the reconstruction of ancestral DNA and genome sequences, 9) the detection and analysis of genome synteny, 10) the inference of ancestral gene order, among others. In addition, an important new sub-field of Comparative Genomics has emerged in the last decade, referred to as Pangenomics, making available improved tools to analyze the exponential genomic data accumulating since the development of Second- and Third-Generation Sequencing Technologies.
The first definition of Pangenome was synonymous with the entire repertoire of genes accessible to a particular species (the union of all genes). This initial concept was proposed when researchers tried to answer how many genome sequences are needed so we can have high confidence that all of the genes of a given species have been identified. Because the construction of this computational object is gene-oriented, the classical concept of Pangenome is also referred to as the Gene-based Pangenome. The mathematical matrix associated with this object allows the classification of genes into three different categories: Core Genes (encoded in more than 99% of the strains), Shell Genes (10%-99% of the strains) and Cloud Genes (less than 10% of the strains). The advent of Third-Generation Sequencing Technologies allowed routine generation of end-to-end genome sequences. However, this upgrade in sequencing technologies also imposed the development of new data structures, algorithms and statistical methods to process, analyze and store this new genomic data, giving rise to Computational Pangenomics. This new sub-field is dedicated to the construction and analysis of Pangenome Graphs, computational objects consisting in bidirected graphs that allow an easy representation of both strands of DNA and of whole-genome variation resulting from recombination, gene reshuffling and other mutational events. Pangenome graphs are starting to be adopted as reference for the mapping, aligning, simulation and genotyping of sequencing reads and, in the coming years, it is foreseeable that these graph-based structures will completely replace the use of linear references. The advent of Computational Pangenomics also led to the proposal of an alternative, more contemporary definition of Pangenome, now a synonym of all genomic variation observed in the members of a given species or clade.
This research topic proposes publishing state of the art studies in Microbial Comparative Genomics and Pangenomics reporting new insights into Gene and Genome Evolution. Studies involving the development of new algorithms and toolkits in these fields of knowledge will be also strongly valued due to their increasing importance in experimental microbiology.
Comparative Genomics is the field of knowledge dedicated to the analysis and comparison of genes and genomes. The scientific areas comprised in this field include subjects as diverse as (just naming a few): 1) the development of algorithms for the alignment of genes, whole genomes, short- and long sequencing reads, 2) the search for remote sequence similarity, 3) the discovery of motifs and sequence patterns, 4) the identification of gene families, 5) the detection of ortholog/paralog groups, 6) the reconstruction of evolutionary history of the genes, 7) the detection of signs of selective forces exerted over genes and genomes, 8) the reconstruction of ancestral DNA and genome sequences, 9) the detection and analysis of genome synteny, 10) the inference of ancestral gene order, among others. In addition, an important new sub-field of Comparative Genomics has emerged in the last decade, referred to as Pangenomics, making available improved tools to analyze the exponential genomic data accumulating since the development of Second- and Third-Generation Sequencing Technologies.
The first definition of Pangenome was synonymous with the entire repertoire of genes accessible to a particular species (the union of all genes). This initial concept was proposed when researchers tried to answer how many genome sequences are needed so we can have high confidence that all of the genes of a given species have been identified. Because the construction of this computational object is gene-oriented, the classical concept of Pangenome is also referred to as the Gene-based Pangenome. The mathematical matrix associated with this object allows the classification of genes into three different categories: Core Genes (encoded in more than 99% of the strains), Shell Genes (10%-99% of the strains) and Cloud Genes (less than 10% of the strains). The advent of Third-Generation Sequencing Technologies allowed routine generation of end-to-end genome sequences. However, this upgrade in sequencing technologies also imposed the development of new data structures, algorithms and statistical methods to process, analyze and store this new genomic data, giving rise to Computational Pangenomics. This new sub-field is dedicated to the construction and analysis of Pangenome Graphs, computational objects consisting in bidirected graphs that allow an easy representation of both strands of DNA and of whole-genome variation resulting from recombination, gene reshuffling and other mutational events. Pangenome graphs are starting to be adopted as reference for the mapping, aligning, simulation and genotyping of sequencing reads and, in the coming years, it is foreseeable that these graph-based structures will completely replace the use of linear references. The advent of Computational Pangenomics also led to the proposal of an alternative, more contemporary definition of Pangenome, now a synonym of all genomic variation observed in the members of a given species or clade.
This research topic proposes publishing state of the art studies in Microbial Comparative Genomics and Pangenomics reporting new insights into Gene and Genome Evolution. Studies involving the development of new algorithms and toolkits in these fields of knowledge will be also strongly valued due to their increasing importance in experimental microbiology.