AUTHOR=Zhang Yanqin , Li Zhiyuan
TITLE=RF_phage virion: Classification of phage virion proteins with a random forest model
JOURNAL=Frontiers in Genetics
VOLUME=13
YEAR=2023
URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.1103783
DOI=10.3389/fgene.2022.1103783
ISSN=1664-8021
ABSTRACT=
Introduction: Phages play essential roles in biological procession, and the virion proteins encoded by the phage genome constitute critical elements of the assembled phage particle.
Methods: This study uses machine learning methods to classify phage virion proteins. We proposed a novel approach, RF_phage virion, for the effective classification of the virion and non-virion proteins. The model uses four protein sequence coding methods as features, and the random forest algorithm was employed to solve the classification problem.
Results: The performance of the RF_phage virion model was analyzed by comparing the performance of this algorithm with that of classical machine learning methods. The proposed method achieved a specificity (Sp) of 93.37%%, sensitivity (Sn) of 90.30%, accuracy (Acc) of 91.84%, Matthews correlation coefficient (MCC) of .8371, and an F1 score of .9196.