AUTHOR=Huang Ailan , Lu Fuping , Liu Fufeng TITLE=Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor JOURNAL=Frontiers in Microbiology VOLUME=14 YEAR=2023 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2023.1130594 DOI=10.3389/fmicb.2023.1130594 ISSN=1664-302X ABSTRACT=Introduction

Psychrophilic enzymes are a class of macromolecules with high catalytic activity at low temperatures. Cold-active enzymes possessing eco-friendly and cost-effective properties, are of huge potential application in detergent, textiles, environmental remediation, pharmaceutical as well as food industry. Compared with the time-consuming and labor-intensive experiments, computational modeling especially the machine learning (ML) algorithm is a high-throughput screening tool to identify psychrophilic enzymes efficiently.

Methods

In this study, the influence of 4 ML methods (support vector machines, K-nearest neighbor, random forest, and naïve Bayes), and three descriptors, i.e., amino acid composition (AAC), dipeptide combinations (DPC), and AAC + DPC on the model performance were systematically analyzed.

Results and discussion

Among the 4 ML methods, the support vector machine model based on the AAC descriptor using 5-fold cross-validation achieved the best prediction accuracy with 80.6%. The AAC outperformed than the DPC and AAC + DPC descriptors regardless of the ML methods used. In addition, amino acid frequencies between psychrophilic and non-psychrophilic proteins revealed that higher frequencies of Ala, Gly, Ser, and Thr, and lower frequencies of Glu, Lys, Arg, Ile,Val, and Leu could be related to the protein psychrophilicity. Further, ternary models were also developed that could classify psychrophilic, mesophilic, and thermophilic proteins effectively. The predictive accuracy of the ternary classification model using AAC descriptor via the support vector machine algorithm was 75.8%. These findings would enhance our insight into the cold-adaption mechanisms of psychrophilic proteins and aid in the design of engineered cold-active enzymes. Moreover, the proposed model could be used as a screening tool to identify novel cold-adapted proteins.