About this Research Topic
One way to approach this is to learn a representation for proteins: an object (e.g., a vector) that is calculated from the protein’s sequence or structure. Such a succinct representation is useful for mapping the protein universe, creating a bird’s eye perspective of it. Alternatively, given a query protein, the representation can be calculated, compared to that of other proteins to identify relevant and studied ones. A succinct representation can be used in a wide variety of computational tasks, and indeed its value should be judged by the performance of methods that uses it on a given task.
The recent deep learning (DL) revolution has had an impact on this front as well: (trustworthy) datasets have expanded dramatically, and new tools based on DL technology emerge (e.g., protein language models). Explainability can be used to highlight important parts of the protein with respect to function. That the representation can be calculated for a protein sequence (e.g., using self-supervision in sequence space, or by training on structural data) suggests that an abstract view of protein space can be expanded beyond solved structures, to all proteins.
We propose having a special issue focused on the advances on these fronts. Topics of interest can include:
• Abstract representations of the protein universe
• Using representations for mapping the protein universe
• Deep learning architectures to learn protein representation models (for sequence or structures)
• Self-supervision for learning protein representation
• Deep learning architectures to learn embeddings for disordered proteins
• Relevant downstream tasks for applications of learned representations for proteins
• Application of deep learning representations to downstream tasks
• Applications of deep learning representations for enhancing experimental methods
• Explainability of the protein language models
• Datasets and their benchmarking
• Identifying relevant proteins for different applications (conservation of sequence, same function, same functional site)
• Using representations to better understand protein evolution
• Using representations to better understand protein biophysics
Keywords: Natural Language Processing (NLP), Deep learning, Structure prediction, Representation, Disordered proteins, Protein features, Gene ontology Protein function prediction, protein structure classification
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.