AUTHOR=Zhang Liyuan , Deng Tingzhi , Pan Shuijing , Zhang Minghui , Zhang Yusen , Yang Chunhua , Yang Xiaoyong , Tian Geng , Mi Jia TITLE=DeepO-GlcNAc: a web server for prediction of protein O-GlcNAcylation sites using deep learning combined with attention mechanism JOURNAL=Frontiers in Cell and Developmental Biology VOLUME=12 YEAR=2024 URL=https://www.frontiersin.org/journals/cell-and-developmental-biology/articles/10.3389/fcell.2024.1456728 DOI=10.3389/fcell.2024.1456728 ISSN=2296-634X ABSTRACT=Introduction

Protein O-GlcNAcylation is a dynamic post-translational modification involved in major cellular processes and associated with many human diseases. Bioinformatic prediction of O-GlcNAc sites before experimental validation is a challenge task in O-GlcNAc research. Recent advancements in deep learning algorithms and the availability of O-GlcNAc proteomics data present an opportunity to improve O-GlcNAc site prediction.

Objectives

This study aims to develop a deep learning-based tool to improve O-GlcNAcylation site prediction.

Methods

We construct an annotated unbalanced O-GlcNAcylation data set and propose a new deep learning framework, DeepO-GlcNAc, using Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) combined with attention mechanism.

Results

The ablation study confirms that the additional model components in DeepO-GlcNAc, such as attention mechanisms and LSTM, contribute positively to improving prediction performance. Our model demonstrates strong robustness across five cross-species datasets, excluding humans. We also compare our model with three external predictors using an independent dataset. Our results demonstrated that DeepO-GlcNAc outperforms the external predictors, achieving an accuracy of 92%, an average precision of 72%, a MCC of 0.60, and an AUC of 92% in ROC analysis. Moreover, we have implemented DeepO-GlcNAc as a web server to facilitate further investigation and usage by the scientific community.

Conclusion

Our work demonstrates the feasibility of utilizing deep learning for O-GlcNAc site prediction and provides a novel tool for O-GlcNAc investigation.