AUTHOR=Zou Xidan , Ren Liping , Cai Peiling , Zhang Yang , Ding Hui , Deng Kejun , Yu Xiaolong , Lin Hao , Huang Chengbing TITLE=Accurately identifying hemagglutinin using sequence information and machine learning methods JOURNAL=Frontiers in Medicine VOLUME=10 YEAR=2023 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2023.1281880 DOI=10.3389/fmed.2023.1281880 ISSN=2296-858X ABSTRACT=Introduction

Hemagglutinin (HA) is responsible for facilitating viral entry and infection by promoting the fusion between the host membrane and the virus. Given its significance in the process of influenza virus infestation, HA has garnered attention as a target for influenza drug and vaccine development. Thus, accurately identifying HA is crucial for the development of targeted vaccine drugs. However, the identification of HA using in-silico methods is still lacking. This study aims to design a computational model to identify HA.

Methods

In this study, a benchmark dataset comprising 106 HA and 106 non-HA sequences were obtained from UniProt. Various sequence-based features were used to formulate samples. By perform feature optimization and inputting them four kinds of machine learning methods, we constructed an integrated classifier model using the stacking algorithm.

Results and discussion

The model achieved an accuracy of 95.85% and with an area under the receiver operating characteristic (ROC) curve of 0.9863 in the 5-fold cross-validation. In the independent test, the model exhibited an accuracy of 93.18% and with an area under the ROC curve of 0.9793. The code can be found from https://github.com/Zouxidan/HA_predict.git. The proposed model has excellent prediction performance. The model will provide convenience for biochemical scholars for the study of HA.