Functional annotation of an entire genome is critical to understand any biological process and its role in biological pathways. Yet, a large part of the human genome, and much more for non-model organisms, remains un-annotated. Simple, sequence-similarity based annotations have been found to be grossly inadequate for this purpose and more sophisticated intelligent systems, such as machine learning have been employed frequently. In its basic formulation, machine-learning techniques found their way into the field of biological functional annotation quite early. Secondary structure prediction using machine learning was done as early as in mid 80’s and many other areas of biological sequence, structure and/or function prediction have seen great advances in terms of the complexity of techniques, feature engineering and other principles of data-driven analytics. Several computational techniques have been developed exclusively for solving functional annotation problems. However, most of the growth has been in terms of the application of emerging and established computational techniques. Machine learning software has often been used as a blackbox tool, while researchers focus on the biological concept of the problem and its solution.
More recently, deep learning methods have made rapid progress and have shown particular success with problems associated with large amounts of biological data. Typically popular amongst them have been convolutional neural networks (CNN), multi-layer feed forward neural networks and long short-term memory (LSTM) networks, along with their variants.
In parallel with machine learning, the biological understanding of molecular function and organization of knowledge on this subject has also undergone rapid advances. Instead of scattered and ambiguous labelling of function, systematic annotations in terms of ontologies, in the form of hierarchical and nested labels have made the task of annotation learning and prediction much more robust.
Much has been achieved on biological and technical aspects of functional annotations but many hurdles remain. Consequently, there are clear opportunities for researchers to fill the gaps.
This Research Topic invites submissions of original research or review papers based on the above framework as outlined but not limited to the description below:
1) From researchers working on intelligent systems and statistical/machine learning techniques for biological function prediction from sequence, structure or gene expression data.
2) Analyzing gene ontologies or specialized functions such as protein-protein or protein-RNA interaction and disease associations.
3) Dealing with biological function as a single unit such as being kinase or protease as well as in a pathway will be considered.
4) Broader biological function prediction or identification of genomic features such as DNA methylation and other genome-wide functional patterns at individual or systems level specifically addressing some aspect of the problem of characterizing the function of genomes annotations. General theoretical methods of artificial intelligence and deep learning without a direct application to these biological problems are out of the scope.
The papers must be written in a language accessible to biologists. Mathematical expressions and technical terminology may be used, but these should be presented in an easy-to-understand manner for life scientists.
Functional annotation of an entire genome is critical to understand any biological process and its role in biological pathways. Yet, a large part of the human genome, and much more for non-model organisms, remains un-annotated. Simple, sequence-similarity based annotations have been found to be grossly inadequate for this purpose and more sophisticated intelligent systems, such as machine learning have been employed frequently. In its basic formulation, machine-learning techniques found their way into the field of biological functional annotation quite early. Secondary structure prediction using machine learning was done as early as in mid 80’s and many other areas of biological sequence, structure and/or function prediction have seen great advances in terms of the complexity of techniques, feature engineering and other principles of data-driven analytics. Several computational techniques have been developed exclusively for solving functional annotation problems. However, most of the growth has been in terms of the application of emerging and established computational techniques. Machine learning software has often been used as a blackbox tool, while researchers focus on the biological concept of the problem and its solution.
More recently, deep learning methods have made rapid progress and have shown particular success with problems associated with large amounts of biological data. Typically popular amongst them have been convolutional neural networks (CNN), multi-layer feed forward neural networks and long short-term memory (LSTM) networks, along with their variants.
In parallel with machine learning, the biological understanding of molecular function and organization of knowledge on this subject has also undergone rapid advances. Instead of scattered and ambiguous labelling of function, systematic annotations in terms of ontologies, in the form of hierarchical and nested labels have made the task of annotation learning and prediction much more robust.
Much has been achieved on biological and technical aspects of functional annotations but many hurdles remain. Consequently, there are clear opportunities for researchers to fill the gaps.
This Research Topic invites submissions of original research or review papers based on the above framework as outlined but not limited to the description below:
1) From researchers working on intelligent systems and statistical/machine learning techniques for biological function prediction from sequence, structure or gene expression data.
2) Analyzing gene ontologies or specialized functions such as protein-protein or protein-RNA interaction and disease associations.
3) Dealing with biological function as a single unit such as being kinase or protease as well as in a pathway will be considered.
4) Broader biological function prediction or identification of genomic features such as DNA methylation and other genome-wide functional patterns at individual or systems level specifically addressing some aspect of the problem of characterizing the function of genomes annotations. General theoretical methods of artificial intelligence and deep learning without a direct application to these biological problems are out of the scope.
The papers must be written in a language accessible to biologists. Mathematical expressions and technical terminology may be used, but these should be presented in an easy-to-understand manner for life scientists.