The exceptional growth of biological sequence databases marks a new era for biology. Yet, a large-scale analysis of this data poses a computational challenge. This Research Topic is focused on algorithmic advances leading to accelerated and/or more accurate nucleic acid and protein sequence analysis. The issue includes but is not limited to problems of sequence alignment, search, genome assembly, and variant calling. It emphasizes various algorithms for speed optimization based on leveraging the architecture of modern processors, and time and memory complexity-reducing innovations such as algorithmic improvements to computationally intensive subproblems.
The scope also includes qualitative advances in sequence analysis. They represent improvements in sensitivity and specificity of homology searches, more accurate models of sequences and sequence families (profiles) for alignment, more accurate statistics, algorithms for error detection and correction, and others. Approaches based on machine learning frameworks for increasing the accuracy or speed of a particular aspect of sequence analysis fall within the scope of this Research Topic too.
In general, this Research Topic highlights algorithmic solutions important to studies of sequence evolution. Therefore, these algorithms are expected to facilitate the analysis of numerous individual sequences. Genome assembly, error correction, the detection of variants and structural variations in genomes, as well as protein sequence classification, clustering, phylogenetics, and annotation are several examples. Scalable algorithmic and methodological developments increase the capacity to process large datasets, and, therefore, advances in computing performance facilitate sequence analysis on a large scale. Examples include sequence database indexing approaches for increasing speed or reducing disk space, novel hash functions for reducing the number of collisions, time- and memory-efficient alignment or alignment-free algorithms, novel and efficient data structures, algorithms for parallel computing, and others.
The exceptional growth of biological sequence databases marks a new era for biology. Yet, a large-scale analysis of this data poses a computational challenge. This Research Topic is focused on algorithmic advances leading to accelerated and/or more accurate nucleic acid and protein sequence analysis. The issue includes but is not limited to problems of sequence alignment, search, genome assembly, and variant calling. It emphasizes various algorithms for speed optimization based on leveraging the architecture of modern processors, and time and memory complexity-reducing innovations such as algorithmic improvements to computationally intensive subproblems.
The scope also includes qualitative advances in sequence analysis. They represent improvements in sensitivity and specificity of homology searches, more accurate models of sequences and sequence families (profiles) for alignment, more accurate statistics, algorithms for error detection and correction, and others. Approaches based on machine learning frameworks for increasing the accuracy or speed of a particular aspect of sequence analysis fall within the scope of this Research Topic too.
In general, this Research Topic highlights algorithmic solutions important to studies of sequence evolution. Therefore, these algorithms are expected to facilitate the analysis of numerous individual sequences. Genome assembly, error correction, the detection of variants and structural variations in genomes, as well as protein sequence classification, clustering, phylogenetics, and annotation are several examples. Scalable algorithmic and methodological developments increase the capacity to process large datasets, and, therefore, advances in computing performance facilitate sequence analysis on a large scale. Examples include sequence database indexing approaches for increasing speed or reducing disk space, novel hash functions for reducing the number of collisions, time- and memory-efficient alignment or alignment-free algorithms, novel and efficient data structures, algorithms for parallel computing, and others.