Introduction

AUTHOR=Huang Ailan , Lu Fuping , Liu Fufeng 

TITLE=Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor

JOURNAL=Frontiers in Microbiology

VOLUME=Volume 14 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2023.1130594

DOI=10.3389/fmicb.2023.1130594

ISSN=1664-302X

ABSTRACT=<sec><title>Introduction</title><p>Psychrophilic enzymes are a class of macromolecules with high catalytic activity at low temperatures. Cold-active enzymes possessing eco-friendly and cost-effective properties, are of huge potential application in detergent, textiles, environmental remediation, pharmaceutical as well as food industry. Compared with the time-consuming and labor-intensive experiments, computational modeling especially the machine learning (ML) algorithm is a high-throughput screening tool to identify psychrophilic enzymes efficiently.</p></sec><sec><title>Methods</title><p>In this study, the influence of 4 ML methods (support vector machines, K-nearest neighbor, random forest, and naïve Bayes), and three descriptors, i.e., amino acid composition (AAC), dipeptide combinations (DPC), and AAC + DPC on the model performance were systematically analyzed.</p></sec><sec><title>Results and discussion</title><p>Among the 4 ML methods, the support vector machine model based on the AAC descriptor using 5-fold cross-validation achieved the best prediction accuracy with 80.6%. The AAC outperformed than the DPC and AAC + DPC descriptors regardless of the ML methods used. In addition, amino acid frequencies between psychrophilic and non-psychrophilic proteins revealed that higher frequencies of Ala, Gly, Ser, and Thr, and lower frequencies of Glu, Lys, Arg, Ile,Val, and Leu could be related to the protein psychrophilicity. Further, ternary models were also developed that could classify psychrophilic, mesophilic, and thermophilic proteins effectively. The predictive accuracy of the ternary classification model using AAC descriptor <italic>via</italic> the support vector machine algorithm was 75.8%. These findings would enhance our insight into the cold-adaption mechanisms of psychrophilic proteins and aid in the design of engineered cold-active enzymes. Moreover, the proposed model could be used as a screening tool to identify novel cold-adapted proteins.</p></sec>