Designing meaningful abstractions for the protein universe remains a challenge, especially ones that include all the known protein universe, beyond that with solved structures. Meaningful abstractions are loosely defined and describe various ways to relate proteins properties, such as their sequences, structures, and functions, to one another. Such abstractions allow us to formulate, quantify, and hold up to scrutiny observations about proteins. The hierarchical classifications of protein space of known structure are the premier example of such an abstraction. Here, however, we focus on the alternatives that offer complementary views. Examples are maps or network representations of protein space.
One way to approach this is to learn a representation for proteins: an object (e.g., a vector) that is calculated from the protein’s sequence or structure. Such a succinct representation is useful for mapping the protein universe, creating a bird’s eye perspective of it. Alternatively, given a query protein, the representation can be calculated, compared to that of other proteins to identify relevant and studied ones. A succinct representation can be used in a wide variety of computational tasks, and indeed its value should be judged by the performance of methods that uses it on a given task.
The recent deep learning (DL) revolution has had an impact on this front as well: (trustworthy) datasets have expanded dramatically, and new tools based on DL technology emerge (e.g., protein language models). Explainability can be used to highlight important parts of the protein with respect to function. That the representation can be calculated for a protein sequence (e.g., using self-supervision in sequence space, or by training on structural data) suggests that an abstract view of protein space can be expanded beyond solved structures, to all proteins.
We propose having a special issue focused on the advances on these fronts. Topics of interest can include:
• Abstract representations of the protein universe
• Using representations for mapping the protein universe
• Deep learning architectures to learn protein representation models (for sequence or structures)
• Self-supervision for learning protein representation
• Deep learning architectures to learn embeddings for disordered proteins
• Relevant downstream tasks for applications of learned representations for proteins
• Application of deep learning representations to downstream tasks
• Applications of deep learning representations for enhancing experimental methods
• Explainability of the protein language models
• Datasets and their benchmarking
• Identifying relevant proteins for different applications (conservation of sequence, same function, same functional site)
• Using representations to better understand protein evolution
• Using representations to better understand protein biophysics
Designing meaningful abstractions for the protein universe remains a challenge, especially ones that include all the known protein universe, beyond that with solved structures. Meaningful abstractions are loosely defined and describe various ways to relate proteins properties, such as their sequences, structures, and functions, to one another. Such abstractions allow us to formulate, quantify, and hold up to scrutiny observations about proteins. The hierarchical classifications of protein space of known structure are the premier example of such an abstraction. Here, however, we focus on the alternatives that offer complementary views. Examples are maps or network representations of protein space.
One way to approach this is to learn a representation for proteins: an object (e.g., a vector) that is calculated from the protein’s sequence or structure. Such a succinct representation is useful for mapping the protein universe, creating a bird’s eye perspective of it. Alternatively, given a query protein, the representation can be calculated, compared to that of other proteins to identify relevant and studied ones. A succinct representation can be used in a wide variety of computational tasks, and indeed its value should be judged by the performance of methods that uses it on a given task.
The recent deep learning (DL) revolution has had an impact on this front as well: (trustworthy) datasets have expanded dramatically, and new tools based on DL technology emerge (e.g., protein language models). Explainability can be used to highlight important parts of the protein with respect to function. That the representation can be calculated for a protein sequence (e.g., using self-supervision in sequence space, or by training on structural data) suggests that an abstract view of protein space can be expanded beyond solved structures, to all proteins.
We propose having a special issue focused on the advances on these fronts. Topics of interest can include:
• Abstract representations of the protein universe
• Using representations for mapping the protein universe
• Deep learning architectures to learn protein representation models (for sequence or structures)
• Self-supervision for learning protein representation
• Deep learning architectures to learn embeddings for disordered proteins
• Relevant downstream tasks for applications of learned representations for proteins
• Application of deep learning representations to downstream tasks
• Applications of deep learning representations for enhancing experimental methods
• Explainability of the protein language models
• Datasets and their benchmarking
• Identifying relevant proteins for different applications (conservation of sequence, same function, same functional site)
• Using representations to better understand protein evolution
• Using representations to better understand protein biophysics