- 1Cardiovascular Department, The Fourth Hospital of Changsha (Integrated Traditional Chinese and Western Medicine Hospital of Changsha, Changsha Hospital of Hunan Normal University), Changsha, China
- 2School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
Introduction: Protein–protein interactions (PPIs) are critical for understanding the molecular mechanisms underlying various biological processes, particularly in microbes associated with cardiovascular disease. Traditional experimental methods for detecting PPIs are often time-consuming and costly, leading to an urgent need for reliable computational approaches.
Methods: In this study, we present a novel model, the deep denoising autoencoder for protein–protein interaction (DAEPPI), which leverages the denoising autoencoder and the CatBoost algorithm to predict PPIs from the evolutionary information of protein sequences.
Results: Our extensive experiments demonstrate the effectiveness of the DAEPPI model, achieving average prediction accuracies of 97.85% and 98.49% on yeast and human datasets, respectively. Comparative analyses with existing effective methods further validate the robustness and reliability of our model in predicting PPIs.
Discussion: Additionally, we explore the application of DAEPPI in the context of cardiovascular disease, showcasing its potential to uncover significant interactions that could contribute to the understanding of disease mechanisms. Our findings indicate that DAEPPI is a powerful tool for advancing research in proteomics and could play a pivotal role in the identification of novel therapeutic targets in cardiovascular disease.
1 Introduction
Cardiovascular diseases (CVDs) are a major global health challenge, accounting for a significant proportion of morbidity and mortality. These diseases encompass a range of conditions affecting the heart and blood vessels, including coronary artery disease, heart failure, and stroke (Tada et al., 2022; Gaidai et al., 2023). The multifactorial nature of CVDs is influenced by various elements, such as genetic predispositions, lifestyle factors, and environmental exposures (Dale et al., 2023; Ojeda-Granados et al., 2024). At the molecular level, proteins play a pivotal role in the pathophysiology of CVDs as they are integral to processes such as inflammation, lipid metabolism, and vascular function (Frąk et al., 2022). Understanding the intricate relationships between these proteins and their interactions is crucial for unraveling the mechanisms underlying cardiovascular pathologies. Thus, the study of protein–protein interactions (PPIs) holds significant promise for advancing our knowledge of CVDs and identifying potential therapeutic targets (Greenblatt et al., 2024).
The importance of PPIs has garnered increasing attention within the scientific community, positioning them as a focal point in biological research. Experimental methods for large-scale PPI identification, such as yeast two-hybrid screening (Wong et al., 2017) and affinity purification coupled with mass spectrometry (Zhong et al., 2023), have been developed to explore these interactions. However, these biological techniques often suffer from limitations, including high costs, time demands, and the potential for false-positive or false-negative results. Consequently, there has been a growing need for effective computational methods to predict PPIs, offering a complementary approach that can enhance the accuracy and efficiency of PPI identification (Xian and Wang, 2024).
Numerous computational strategies have emerged (Guo et al., 2024; Wei et al., 2024), leveraging a wide variety of data types to predict PPIs. For example, You et al. (2013) proposed a PCA-EELM model, which extracts features from protein sequences by autocovariance scores, conjoint triad scores, local descriptor scores, and autocorrelation scores and then combines PCA and ensemble ELM to predict PPIs. Li et al. (2020) adopted a weighted ELM integrated with a scale-invariant feature transform algorithm for accurate prediction. Zhang et al. (2019) implemented an ensemble deep neural network model that leveraged three representative protein feature extraction descriptors combined with a separate deep neural network, achieving high accuracy in predicting PPIs. Hashemifar et al. (2018) presented a sequence-based deep learning framework, DPPI, which combines convolutional and random projection modules in order to enhance the prediction accuracy. Halder et al. (2020) developed a new method called JUPPI, which takes the sequence, GO, and structural domain information of proteins as input features; constructs a high-quality negative PPI dataset by the proposed three-stage filtering strategy; and combines it with a random forest classifier to achieve remarkable results in detecting PPIs of the human proteome. Ma et al. (2024) employed CollaPPI, a collaborative learning framework with information sharing, to predict PPIs, which showed the cross-domain knowledge complementarity of collaborative learning methods in this task. Despite the significant advancements in these computational methods, challenges remain in improving the efficiency and accuracy of PPI predictions. As the volume of protein sequence data continues to increase exponentially, there is an increasing need for effective models that can operate solely on sequence information. This emphasizes the importance of ongoing research to refine and develop innovative computational techniques for PPI prediction, ensuring they can meet the demands of the scientific community.
In this study, we introduce a novel computational model, the deep denoising autoencoder for protein–protein interaction (DAEPPI), which is designed to predict PPIs by utilizing evolutionary information extracted from amino acid sequences. We provide a detailed breakdown of the model architecture and its components, highlighting how each aspect contributes to its predictive capability. Our extensive experimental validation showcases the model’s effectiveness in predicting PPIs, with impressive performance results on both yeast and human datasets. Furthermore, we present comparative experiments that underscore the reliability of the DAEPPI model. Notably, we explore its application in the context of cardiovascular diseases, demonstrating its potential implications for understanding protein interactions that contribute to these conditions. In conclusion, through this research, we aim to further elucidate the role of PPIs in cardiovascular disease mechanisms and their significance for future therapeutic strategies.
2 Materials and methods
2.1 Data sources
In this study, we constructed two comprehensive PPI datasets from distinct organisms: Saccharomyces cerevisiae (yeast) and Homo sapiens (human). The yeast dataset was sourced from the Database of Interacting Proteins (DIP) (Salwinski et al., 2004), where we carefully filtered out protein pairs containing fewer than 50 residues or exhibiting over 40% sequence identity. This rigorous selection process resulted in a positive dataset comprising 5,594 interacting protein pairs, complemented by an equal number of non-interacting pairs derived from proteins with differing subcellular localizations, culminating in a total of 11,188 protein pairs.
For the human dataset, we utilized the Human Protein Reference Database (HPRD). After excluding pairs with more than 25% sequence identity, we identified 3,899 verified interacting pairs among 2,502 unique human proteins. Additionally, we created a negative dataset consisting of 4,262 pairs from 661 distinct proteins, ensuring that these non-interacting proteins were sourced from various subcellular compartments (You et al., 2014). Ultimately, the human dataset comprised a total of 8,161 protein pairs. Together, these datasets provide a robust foundation for evaluating our DAEPPI method in predicting protein interactions across different biological contexts.
2.2 Position-specific scoring matrix
The position-specific scoring matrix (PSSM) (Liu and Steinegger, 2023) is a powerful tool widely used for identifying distantly homologous proteins, as well as for sequence-level studies such as sequence alignment and conservation analysis. In our research, we employed PSSM to enhance the prediction of PPIs. Each protein sequence was converted into a PSSM using the position-specific iterated basic local alignment search tool (PSI-BLAST) (Altschul et al., 1997; Jin et al., 2021). The PSSM is structured as an
where each element
2.3 Deep denoising autoencoder
After obtaining the PSSM of a protein sequence, we need to perform feature extraction on it to represent each protein effectively. However, since the length of protein sequences is not fixed, the size of the constructed PSSM matrices will be different, which makes it impossible to directly input them into the deep learning model. To solve this problem, we adopt a uniform-sized PSSM matrix
Next, in this study, we employed the denoising autoencoder (DAE) algorithm (Vincent et al., 2008; Peng et al., 2023) to extract deeper feature representations of protein sequences from the generated
From a mathematical point of view, the process of encoding can be defined as follows:
Here,
where the encoder compresses the input
The DAE model is particularly effective for this purpose as it enhances the robustness of the feature representation by reconstructing inputs that have been intentionally corrupted with noise. This process also involves two primary phases: encoding and decoding. Formally, the DAE first adds noise to the original data
Then, as in the case of the autoencoder, the corrupted input
Consequently, we use the decoder
where
After obtaining the uncorrupted input
Ultimately, by taking the PSSM of protein sequences as input and then performing encoding–decoding operations, DAE ensures that the learned features are not only lower-dimensional but also more representative of the underlying biological information. Thus, we can enhance our DAEPPI computational model to fully utilize the evolutionary insights embedded within the PSSM.
2.4 Categorical boosting (CatBoost)
In our classification task for predicting protein interactions, we used the CatBoost algorithm, a powerful gradient-boosting framework that excels in handling categorical features. CatBoost builds on the traditional gradient boosting decision tree (GBDT) method, employing oblivious trees as base learners to enhance the accuracy and generalization while effectively addressing issues such as gradient bias and prediction shift (Dorogush et al., 2018; Zhang et al., 2023). CatBoost handles categorical features well and allows training on the entire original dataset. The CatBoost algorithm computes the categorical features by first randomly permuting the given raw data into
where
To classify the features extracted from the DAE, we first prepared our dataset by combining the DAE-derived feature representations with the corresponding labels. These features, which capture the essential characteristics of the proteins, were then input into the CatBoost classifier. The model was trained several times to learn the relationship between features and their labels to further optimize the hyperparameters of the task. One of the significant advantages of CatBoost is its ability to process categorical variables directly during the training phase, which eliminates the need for extensive preprocessing. By employing target statistics and ordered boosting, CatBoost minimizes information loss, while enriching the feature space, allowing for a more nuanced understanding of the data. This robust classification strategy enabled us to accurately predict protein interactions based on the rich feature representations learned from the DAE, ultimately improving the reliability of our PPI predictions.
2.5 Evaluation measures
To verify the validity of the constructed DAEPPI model, we used four classical evaluation criteria for measurement (Liang et al., 2025). They are accuracy (ACC), precision (PE), sensitivity (SN), and Matthews correlation coefficient (MCC), which are defined as follows:
Here,
3 Results and discussion
3.1 Assessment of prediction
To evaluate the predictive performance of our proposed DAEPPI model, which integrates PSSM, DAE, and CatBoost, we conducted extensive assessments on the Yeast and Human datasets using five-fold cross-validation. This method allowed us to partition each dataset into five distinct subsets, ensuring that our model was tested across multiple scenarios for robustness. The results demonstrated that the DAEPPI model achieved impressive performance metrics, as shown in Tables 1–3. For the yeast dataset, we recorded an average ACC of 0.9785, along with PE and SN values of 0.9707 and 0.9870, respectively. The AUC reached 0.9985, indicating a strong predictive capability. Similarly, on the Human dataset, the model maintained high performance, achieving an average ACC of 0.9849, a PE of 0.9929, and an AUC of 0.9989. The standard deviations for these metrics were notably low, reflecting the stability and reliability of our model across different training and testing folds, which further confirms the model’s effectiveness in distinguishing between positive and negative PPIs. The ROC curves of the DAEPPI model on the Yeast and Human datasets are shown in Figures 2, 3. Overall, these results underscore the robustness and accuracy of the DAEPPI model, attributing its success to the effective feature extraction capabilities of the DAE and the powerful classification strength of CatBoost. The integration of evolutionary information through PSSM, combined with advanced machine learning techniques, positions our model as a valuable tool for predicting protein interactions.
3.2 Comparison with different PSSM transformations
In our study, the DAEPPI model employs an equal-sized PSSM transformation, specifically the transposition of PSSM against itself. This approach has demonstrated robust predictive performance on both the yeast and human datasets. To further validate the effectiveness of this equal-sized PSSM transformation, we conducted a comparative analysis with a padded PSSM transformation method. Specifically, we adopt the zero padding method (He and Wang, 2022) to transform the PSSM, i.e., when the length of the protein sequence exceeds the number of amino acids, i.e., 20, we truncate the PSSM to construct a
3.3 Comparison with different feature extraction
In this experiment, the DAEPPI model utilizes a DAE for in-depth feature extraction from equal-sized PSSM transforms. This approach has shown impressive predictive performance on both yeast and human datasets. To assess the effectiveness of DAE in feature extraction, we compared our model with different feature extraction methods on the same datasets. Specifically, we employed histogram of oriented gradient (HOG) (Admass et al., 2024) for feature extraction from the equal-sized PSSM, while still utilizing CatBoost for predicting PPIs. HOG is a feature descriptor that captures the distribution of gradients in localized portions of an image, making it effective for edge detection and object recognition. The prediction performance of the HOG-based feature extraction model on the benchmark dataset is shown in Table 4. It achieves an average ACC and AUC of 89.22% and 95.87%, respectively, on the Yeast dataset, while attaining 95.31% and 98.87% on the Human dataset, respectively. The ROC curves of the HOG-based feature extraction method on the Yeast and Human datasets are shown in Figures 6, 7. These comparisons suggest that the HOG-based feature extraction model’s performance in capturing intricate evolutionary information in biological data is notably inferior to that of the DAE approach employed in the model. In conclusion, the comparative analysis highlights the superior effectiveness of the DAE-based feature extraction approach in our DAEPPI model. This reinforces the importance of selecting appropriate feature extraction techniques in enhancing the model performance for predicting protein interactions.
3.4 Comparison with different classifiers
In our DAEPPI model, we have leveraged CatBoost to predict the intricate features extracted by the DAE, resulting in enhanced performance in predicting PPIs on two benchmark datasets. To validate the effectiveness of CatBoost as our chosen classifier, we conducted comparative experiments on the Yeast dataset, substituting CatBoost with several other classification algorithms. Specifically, we replaced CatBoost with naive Bayes (NB), linear discriminant analysis (LDA), support vector machine (SVM), decision tree (DT), and k-nearest neighbors (KNN), while maintaining all other parameters constant. Our comparison results are shown in Table 5. As can be seen, the DAEPPI model with CatBoost outperformed all other classifiers on the Yeast dataset, demonstrating superior predictive accuracy and reliability for PPIs. This highlights the significant advantage of using CatBoost within our framework, reinforcing its effectiveness over traditional classifiers such as NB, LDA, SVM, DT, and KNN in the context of PPI prediction. In summary, the findings from this analysis validate the choice of CatBoost as a classifier in the DAEPPI model, showcasing its capability to yield better performance in predicting protein interactions compared to other established algorithms.
3.5 Comparison with other methods
The DAEPPI model leverages a DAE to extract evolutionary features from the PSSM of protein sequences, combined with CatBoost to enhance the prediction of PPIs. To validate the effectiveness of our model, we conducted a comparative analysis with existing methods on both the Yeast and Human datasets, employing five-fold cross-validation for consistency. Numerous computational methods have been developed for PPI detection, particularly utilizing machine learning algorithms that have garnered significant attention. In this section, we present a comparison between our proposed method and several established models to assess their predictive capabilities. Tables 6, 7 summarize the results obtained from different methods on the Yeast and Human datasets, respectively. From Table 6, it is evident that the accuracy of the existing methods ranges from 90.07% to 96.03%, with sensitivity values between 88.82% and 93.51%, and MCC spanning from 82.10% to 91.83%. In contrast, our DAEPPI model achieved impressive metrics, with an average accuracy of 97.85%, an average sensitivity of 98.70%, and an average MCC of 95.73% on the Yeast dataset. Similarly, the results on the Human dataset, detailed in Table 7, reflect the superiority of our method. The DAEPPI model attained an average accuracy of 98.49%, a precision of 99.29%, and an MCC of 96.99%. This indicates a significant improvement over the existing models. In conclusion, the comparative analysis demonstrates that the DAEPPI model is a robust and reliable approach for predicting PPIs. Its ability to effectively capture evolutionary features through DAE and utilize CatBoost for prediction contributes to its outstanding performance on both datasets, establishing it as a leading method in the field of PPI detection.
3.6 Case studies
CVDs are a leading cause of morbidity and mortality worldwide, characterized by a range of disorders affecting the heart and blood vessels. The complexity of these diseases necessitates a deeper understanding of the molecular interactions that underpin their development and progression (Nasirian and Menichetti, 2023). Previous studies have increasingly highlighted the critical role of PPIs in cardiovascular health, showcasing how alterations in these interactions can contribute to various cardiovascular conditions (Singh et al., 2019). Numerous studies have identified various molecules associated with diseases, including non-coding RNAs (Wang et al., 2023) and protein molecules. In particular, these studies have revealed a strong connection between specific PPIs and cardiovascular disease. For example, interactions involving proteins such as VEGF and its receptors have been implicated in angiogenesis (Li Q. et al., 2021), while proteins involved in lipid metabolism, such as apolipoproteins, are crucial for maintaining vascular health (Mehta and Shapiro, 2022). These insights underscore the importance of exploring the intricate network of PPIs to identify potential biomarkers and therapeutic targets for CVDs.
To further investigate the relevance of the DAEPPI model’s predictions in the context of CVDs, we conducted case study experiments in this section. Our approach involved training the DAEPPI model on the Yeast dataset and subsequently utilizing it to predict PPIs within the Human dataset. The results of the case studies are shown in Table 8, where it can be observed that 11 of the top 15 PPIs predicted by DAEPPI to be associated with CVD have been confirmed by biological experiments. The results revealed several significant PPIs that are closely associated with CVDs, indicating a promising avenue for further research. Through this case study, we aim to demonstrate not only the predictive power of the DAEPPI model but also its potential utility in uncovering critical biological insights that may lead to advancements in CVD prevention and treatment.
4 Conclusion
Machine learning algorithms have become indispensable in the field of proteomics, particularly for predicting PPIs. These computational methods not only enhance accuracy but also streamline the analysis process, saving both time and resources. In this study, we have introduced the DAEPPI model, which leverages a DAE to learn deep evolutionary features from protein sequences represented in PSSM. By integrating these features with the CatBoost algorithm, our model significantly improves the prediction of PPIs. Our extensive experiments demonstrated the effectiveness of the DAEPPI model, showing robust predictive performance on both Yeast and Human datasets. The results indicate that our method consistently outperforms existing techniques, confirming its reliability and accuracy in PPI prediction. Additionally, we conducted comparative experiments and case studies that further validated the DAEPPI model’s effectiveness. Notably, our analysis identified several human PPIs that are closely associated with CVDs, suggesting that the DAEPPI model can not only predict interactions but also uncover potential links to significant health conditions. In conclusion, the DAEPPI model combines deep learning with effective classification techniques to achieve reliable results. We believe that this model will serve as a valuable tool for researchers aiming to explore the intricate networks of protein interactions and their implications in various biological processes.
Data availability statement
The datasets can be obtained from the corresponding author upon reasonable request.
Author contributions
SZ: writing–original draft. JL: data curation and writing–original draft. MT: methodology and writing–original draft. CL: validation and writing–original draft. YL: formal analysis and writing–review and editing. WH: visualization and writing–review and editing.
Funding
The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work was supported in part by the National Science Foundation of China under Grant 62273284.
Acknowledgments
The authors would like to thank all the editors and reviewers for their constructive advices.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no generative AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Admass, W. S., Munaye, Y. Y., and Bogale, G. A. (2024). Convolutional neural networks and histogram-oriented gradients: a hybrid approach for automatic mango disease detection and classification. Int. J. Inf. Technol. 16 (2), 817–829. doi:10.1007/s41870-023-01605-z
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids Res. 25 (17), 3389–3402. doi:10.1093/nar/25.17.3389
An, J.-Y., You, Z.-H., Zhou, Y., and Wang, D.-F. (2019). Sequence-based prediction of protein-protein interactions using gray wolf optimizer–based relevance vector machine. Evol. Bioinforma. 15, 1176934319844522. doi:10.1177/1176934319844522
Dale, C. E., Takhar, R., Carragher, R., Katsoulis, M., Torabi, F., Duffield, S., et al. (2023). The impact of the COVID-19 pandemic on cardiovascular disease prevention and management. Nat. Med. 29 (1), 219–225. doi:10.1038/s41591-022-02158-7
Dorogush, A. V., Ershov, V., and Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. arXiv, 11363.
Du, X., Sun, S., Hu, C., Yao, Y., Yan, Y., and Zhang, Y. (2017). DeepPPI: boosting prediction of protein–protein interactions with deep neural networks. J. Chem. Inf. Model. 57 (6), 1499–1510. doi:10.1021/acs.jcim.7b00028
Frąk, W., Wojtasińska, A., Lisińska, W., Młynarska, E., Franczyk, B., and Rysz, J. (2022). Pathophysiology of cardiovascular diseases: new insights into molecular mechanisms of atherosclerosis, arterial hypertension, and coronary artery disease. Biomedicines 10 (8), 1938. doi:10.3390/biomedicines10081938
Gaidai, O., Cao, Y., and Loginov, S. (2023). Global cardiovascular diseases death rate prediction. Curr. problems Cardiol. 48 (5), 101622. doi:10.1016/j.cpcardiol.2023.101622
Greenblatt, J. F., Alberts, B. M., and Krogan, N. J. (2024). Discovery and significance of protein-protein interactions in health and disease. Cell 187 (23), 6501–6517. doi:10.1016/j.cell.2024.10.038
Guo, L.-X., Wang, L., You, Z.-H., Yu, C.-Q., Hu, M.-L., Zhao, B.-W., et al. (2024). Likelihood-based feature representation learning combined with neighborhood information for predicting circRNA–miRNA associations. Briefings Bioinforma. 25 (2), bbae020. doi:10.1093/bib/bbae020
Halder, A. K., Bandyopadhyay, S. S., Chatterjee, P., Nasipuri, M., Plewczynski, D., and Basu, S. (2020). JUPPI: a multi-level feature based method for PPI prediction and a refined strategy for performance assessment. IEEE/ACM Trans. Comput. Biol. Bioinforma. 19 (1), 531–542. doi:10.1109/TCBB.2020.3004970
Hashemifar, S., Neyshabur, B., Khan, A. A., and Xu, J. (2018). Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 34 (17), i802–i810. doi:10.1093/bioinformatics/bty573
He, Y., and Wang, S. (2022). SE-BLTCNN: a channel attention adapted deep learning model based on PSSM for membrane protein classification. Comput. Biol. Chem. 98, 107680. doi:10.1016/j.compbiolchem.2022.107680
Huang, Y.-A., You, Z.-H., Chen, X., and Yan, G.-Y. (2016). Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition. BMC Syst. Biol. 10, 120–494. doi:10.1186/s12918-016-0360-6
Jin, X., Liao, Q., Wei, H., Zhang, J., and Liu, B. (2021). SMI-BLAST: a novel supervised search framework based on PSI-BLAST for protein remote homology detection. Bioinformatics 37 (7), 913–920. doi:10.1093/bioinformatics/btaa772
Li, J., Shi, X., You, Z.-H., Yi, H.-C., Chen, Z., Lin, Q., et al. (2020). Using weighted extreme learning machine combined with scale-invariant feature transform to predict protein-protein interactions from protein evolutionary information. IEEE/ACM Trans. Comput. Biol. Bioinforma. 17 (5), 1546–1554. doi:10.1109/TCBB.2020.2965919
Li, Q., Zeng, C., Liu, H., Yung, K. W. Y., Chen, C., Xie, Q., et al. (2021a). Protein-protein interaction inhibitor of SRPKs alters the splicing isoforms of VEGF and inhibits angiogenesis. Iscience 24 (5), 102423. doi:10.1016/j.isci.2021.102423
Li, X., Han, P., Chen, W., Gao, C., Wang, S., Song, T., et al. (2023). MARPPI: boosting prediction of protein–protein interactions with multi-scale architecture residual network. Briefings Bioinforma. 24 (1), bbac524. doi:10.1093/bib/bbac524
Li, Y., Hu, X.-G., Li, P.-P., Wang, L., and You, Z.-H. (2022a). “Predicting circRNA-disease associations using similarity assessing graph convolution from multi-source information networks,” in 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 06–08 December, 2022 (IEEE), 94–101.
Li, Y., Wang, Z., Li, L.-P., You, Z.-H., Huang, W.-Z., Zhan, X.-K., et al. (2021b). Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information. Sci. Rep. 11 (1), 16910. doi:10.1038/s41598-021-96265-z
Li, Y., Wang, Z., You, Z.-H., Li, L.-P., and Hu, X. (2022b). Predicting protein-protein interactions via random ferns with evolutionary matrix representation. Comput. Math. Methods Med. 2022 (1), 7191684. doi:10.1155/2022/7191684
Liang, S.-Z., Wang, L., You, Z.-H., Yu, C.-Q., Wei, M.-M., Wei, Y., et al. (2025). Predicting circRNA–disease associations through multisource domain-aware embeddings and feature projection networks. J. Chem. Inf. Model. 65, 1666–1676. doi:10.1021/acs.jcim.4c02250
Liu, D., and Steinegger, M. (2023). Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices. Bioinformatics 39 (8), btad487. doi:10.1093/bioinformatics/btad487
Ma, W., Bi, X., Jiang, H., Zhang, S., and Wei, Z. (2024). CollaPPI: a collaborative learning framework for predicting protein-protein interactions. IEEE J. Biomed. Health Inf. 28, 3167–3177. doi:10.1109/JBHI.2024.3375621
Mehta, A., and Shapiro, M. D. (2022). Apolipoproteins in vascular biology and atherosclerotic disease. Nat. Rev. Cardiol. 19 (3), 168–179. doi:10.1038/s41569-021-00613-5
Nasirian, F., and Menichetti, G. (2023). Molecular interaction networks and cardiovascular disease risk: the role of food bioactive small molecules. Arteriosclerosis, thrombosis, Vasc. Biol. 43 (6), 813–823. doi:10.1161/ATVBAHA.122.318332
Ojeda-Granados, C., Campisi, E., Barchitta, M., and Agodi, A. (2024). Genetic, lifestyle and metabolic factors contributing to cardiovascular disease in the Italian population: a literature review. Front. Nutr. 11, 1379785. doi:10.3389/fnut.2024.1379785
Peng, L., He, X., Peng, X., Li, Z., and Zhang, L. (2023). Stgnnks: identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering. Comput. Biol. Med. 166, 107440. doi:10.1016/j.compbiomed.2023.107440
Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., and Eisenberg, D. (2004). The database of interacting proteins: 2004 update. Nucleic acids Res. 32 (Suppl. l_1), D449–D451. doi:10.1093/nar/gkh086
Singh, N., Bhatia, V., Singh, S., and Bhatnagar, S. (2019). MorCVD: a unified database for host-pathogen protein-protein interactions of cardiovascular diseases related to microbes. Sci. Rep. 9 (1), 4039. doi:10.1038/s41598-019-40704-5
Song, X.-Y., Chen, Z.-H., Sun, X.-Y., You, Z.-H., Li, L.-P., and Zhao, Y. (2018). An ensemble classifier with random projection for predicting protein–protein interactions using sequence and evolutionary information. Appl. Sci. 8 (1), 89. doi:10.3390/app8010089
Tada, H., Takamura, M., and Kawashiri, M.-a. (2022). The effect of diet on cardiovascular disease, heart disease, and blood vessels. Nutrients. 14 (2), 246. doi:10.3390/nu14020246
Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. (2008). “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning, 1096–1103.
Wang, L., Wong, L., You, Z.-H., and Huang, D.-S. (2023). AMDECDA: attention mechanism combined with data ensemble strategy for predicting CircRNA-disease association. IEEE Trans. Big Data 10, 320–329. doi:10.1109/tbdata.2023.3334673
Wei, M., Wang, L., Li, Y., Li, Z., Zhao, B., Su, X., et al. (2024). BioKG-CMI: a multi-source feature fusion model based on biological knowledge graph for predicting circRNA-miRNA interactions. Sci. China Inf. Sci. 67 (8), 189104. doi:10.1007/s11432-024-4098-3
Wong, J. H., Alfatah, M., Sin, M. F., Sim, H. M., Verma, C. S., Lane, D. P., et al. (2017). A yeast two-hybrid system for the screening and characterization of small-molecule inhibitors of protein–protein interactions identifies a novel putative Mdm2-binding site in p53. BMC Biol. 15, 108–117. doi:10.1186/s12915-017-0446-7
Wong, L., You, Z.-H., Ming, Z., Li, J., Chen, X., and Huang, Y.-A. (2015). Detection of interactions between proteins through rotation forest and local phase quantization descriptors. Int. J. Mol. Sci. 17 (1), 21. doi:10.3390/ijms17010021
Xian, L., and Wang, Y. (2024). Advances in computational methods for protein–protein interaction prediction. Electronics 13 (6), 1059. doi:10.3390/electronics13061059
Yao, Y., Du, X., Diao, Y., and Zhu, H. (2019). An integration of deep learning with feature embedding for protein–protein interaction prediction. PeerJ 7, e7126. doi:10.7717/peerj.7126
You, Z.-H., Lei, Y.-K., Zhu, L., Xia, J., and Wang, B. (2013). “Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis,” in BMC bioinformatics (Springer), 1–11.
You, Z.-H., Yu, J.-Z., Zhu, L., Li, S., and Wen, Z.-K. (2014). A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145, 37–43. doi:10.1016/j.neucom.2014.05.072
Zhang, L., Yu, G., Xia, D., and Wang, J. (2019). Protein–protein interactions prediction based on ensemble deep neural networks. Neurocomputing 324, 10–19. doi:10.1016/j.neucom.2018.02.097
Zhang, S., Lu, X., and Lu, Z. (2023). Improved CNN-based CatBoost model for license plate remote sensing image classification. Signal Process. 213, 109196. doi:10.1016/j.sigpro.2023.109196
Keywords: protein–protein interactions, cardiovascular disease, deep denoising autoencoder, CatBoost, evolutionary information
Citation: Zhou S, Luo J, Tang M, Li C, Li Y and He W (2025) Predicting protein–protein interactions in microbes associated with cardiovascular diseases using deep denoising autoencoders and evolutionary information. Front. Pharmacol. 16:1565860. doi: 10.3389/fphar.2025.1565860
Received: 23 January 2025; Accepted: 17 February 2025;
Published: 11 March 2025.
Edited by:
Lei Wang, Chinese Academy of Sciences (CAS), ChinaCopyright © 2025 Zhou, Luo, Tang, Li, Li and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Wenhua He, wenhuahe@foxmail.com; Yang Li, 2021010123@mail.hfut.edu.cn