Biological macromolecules play a vital role in life activities. The biological sequences are often coded by strings, and a large amount of sequence data has been generated and facilitated by recent advances in sequencing technology. Analyzing their structures and functions to obtain useful knowledge is an urgent and important problem to be solved. Machine-learning technology based on statistical theory and data mining frameworks offer tools for data classification and prediction. To provide in-depth structural and functional analysis and take new perspectives and generate novel hypotheses about biological functions of genes or proteins, more advanced algorithms and more powerful biological sequences analysis tools still need to be developed.
In this Research Topic, our main focus is to find excellent classification research of DNA, RNA and amino acid sequences. We also attempt to solve two recurring problems in biological data analysis.
1. Data imbalance
Most machine learning algorithms assume the training data is balanced while data in the real world is usually imbalanced. Thus, largely affecting the reliability and application of prediction tools with the class imbalance problem should be proposed and their effectiveness are compared.
2. Feature embedding models
Sequence data representation is a major factor in controlling model performance, selection of suitable embedding models is essential for model training.
We welcome submissions in the research areas: of bioinformatics and machine learning. Authors who focus on DNA, RNA, protein classification, genome assembly, annotation and functional analysis from the next-generation sequencing data are welcome.
The scope of the Research Topic includes topics but is not limited to:
1. Machine learning
2. Deep learning
3. Data imbalance
4. Sequence analysis
5. Sequence representation
6. Protein sequence embedding
7. Post-translational modifications
8. Sequence modification
Biological macromolecules play a vital role in life activities. The biological sequences are often coded by strings, and a large amount of sequence data has been generated and facilitated by recent advances in sequencing technology. Analyzing their structures and functions to obtain useful knowledge is an urgent and important problem to be solved. Machine-learning technology based on statistical theory and data mining frameworks offer tools for data classification and prediction. To provide in-depth structural and functional analysis and take new perspectives and generate novel hypotheses about biological functions of genes or proteins, more advanced algorithms and more powerful biological sequences analysis tools still need to be developed.
In this Research Topic, our main focus is to find excellent classification research of DNA, RNA and amino acid sequences. We also attempt to solve two recurring problems in biological data analysis.
1. Data imbalance
Most machine learning algorithms assume the training data is balanced while data in the real world is usually imbalanced. Thus, largely affecting the reliability and application of prediction tools with the class imbalance problem should be proposed and their effectiveness are compared.
2. Feature embedding models
Sequence data representation is a major factor in controlling model performance, selection of suitable embedding models is essential for model training.
We welcome submissions in the research areas: of bioinformatics and machine learning. Authors who focus on DNA, RNA, protein classification, genome assembly, annotation and functional analysis from the next-generation sequencing data are welcome.
The scope of the Research Topic includes topics but is not limited to:
1. Machine learning
2. Deep learning
3. Data imbalance
4. Sequence analysis
5. Sequence representation
6. Protein sequence embedding
7. Post-translational modifications
8. Sequence modification