The final, formatted version of the article will be published soon.
EDITORIAL article
Front. Mol. Biosci.
Sec. Biological Modeling and Simulation
Volume 12 - 2025 |
doi: 10.3389/fmolb.2025.1568437
This article is part of the Research Topic Machine Learning in Computer-Aided Drug Design View all 7 articles
Machine Learning in Computer-Aided Drug DesignEditorial
Provisionally accepted- 1 Big Blue Genomics / Redesign Science (New York, US), Belgrade, Serbia
- 2 Icahn School of Medicine at Mount Sinai, New York, New York, United States
- 3 Illumina (United States), San Diego, California, United States
Drug discovery is a long and arduous process with a high risk of failure Wong, Siah, andLo, 2019, Dowden andMunro, 2019. It takes more than a decade and more than a billion dollars to bring a single drug to market (which means that the total cost to pharma companies is even larger when accounting for failed drugs). The chance of a compound entering the preclinical stage and eventually being FDA-approved has been 1 in 20,000 to 30,000 over the last couple of decades Yamaguchi et al., 2021. The cost and complexity of drug research have led major pharmaceutical companies to decrease their involvement in certain disease categories, such as cardiovascular and neurological diseases Dowden and Munro, 2019, or to abandon early research and rely on acquisitions of smaller biotech companies that have drugs in preclinical or early clinical stages of development.All these challenges have forced the pharmaceutical industry to accept in-silico methods as a means of reducing costs and expediting development. Classical tools, such as molecular dynamics (MD), although offering a high level of accuracy and detailed insights into the behavior of proteins, are too expensive for high-throughput studies and are thus used only for evaluating targets and a small number of compounds. Those limitations opened a space for applying machine learning (ML) in drug development. While it has been used in academia for decades, with occasional excursions in the industry, ML came into the spotlight in recent years with the advancements in large language models (LLMs) and denoising diffusion probabilistic models and their use in computational structural biology. The successes of the AlphaFold and RosettaFold models, along with the subsequent Nobel Prize award to Demis Hassabis and John M. Jumper for protein fold prediction and to David Baker for computational protein design, led many to believe that the majority of structural biology and, relatedly, drug design problems would be easily solved. Those were high hopes because ML, although powerful, has limitations. One of the most significant limitations of ML models is their poor generalization outside of training space, making them strongly dependent on the compositions of the training set. An additional issue is, paradoxically, the simplicity with which it is now possible to implement ML models. Modern, advanced ML libraries (PyTorch and TensorFlow) enable the easy deployment of ML models, often without delving into the details of biological phenomena being analyzed. This can lead to a superficial understanding of the results obtained with an ML model. Furthermore, the "black-box" nature of ML models often creates challenges for their adaptation in medical applications, and drug discovery. To bridge the gap between computational power and complex biological systems, interpretable models are needed.With all this in mind, we conceptualize this special issue with the idea of presenting research that utilizes ML protocols/architectures but offers a detailed and comprehensive interpretation of observed phenomena.The first paper in this issue, by Chen, Min, and Ning (2021), deals with the detection of peptides that can bind major histocompatibility complex (MHC) class-I proteins. The authors designed two Convolutional Neural Network-based methods, ConvM and SpConvM, to tackle the binding prediction problem and conducted a thorough bioinformatics study of the results. They show that their method outperforms the current state-of-the-art, allele-specific method in prioritizing and identifying the most likely binding peptides.Huang et al., 2021, addressed the detection of hydration sites in proteins and the prediction of water molecule positions using ML. This is an important issue in drug design as the analysis conducted prior indicates that the majority of ligand binding sites in protein-ligand structures contain at least one bridging water molecule at the interface. The authors' two-component (scoring and sampling) model outperformed alternative approaches by a large margin.The next paper also deals with peptide classification. Khabaz, Rahimi-Nasrabadi, and Homayoun Keihan, 2023, developed a hierarchical machine-learning model for classifying peptides with antimicrobial activity against Staphylococcus aureus. Their two-level model first classifies peptides into Anti-Microbial Peptides (AMPs) and non-AMPs. The second level then classifies AMPs as active and inactive against S. aureus. The model uses linguistic and physicochemical properties, which were selected through cross-validation-based feature selection to identify the most important features. The model can be used in drug discovery, peptide design, and functional annotation of peptides. Faris et al., 2023, developed a method for discovering selective inhibitors against JAK1 and JAK3. The method uses QSAR models optimized with multiple linear regression and artificial neural networks (ANN). It enabled the identification of optimal compounds exhibiting both favorable affinity and stability during a 100 ns molecular dynamics trajectory. This approach, developed with the help of ANNs, has demonstrated its capability to predict biological activity and stability.Chomicz et al, 2024, used clustering and machine learning protocols to develop a method for antibody grouping using clonotype, sequence, paratope prediction, structure prediction, and embedding information. The authors used advanced methods for fast sequence clustering and language models to cluster paratopes. For structure clustering, they applied an adaptation of AlphaFold2 to model antibodies and a fast greedy algorithm-based tool for similarity estimation. The last layer in their architecture is a self-supervised embedding-based language model. They use it to cluster antibody sequences in the latent space. Their results indicate that novel, ML-based methods offer no advantage over standard sequence-based tools for probe-based binder mining. However, they noticed that the advanced ML methods are useful for epitope binning. Thus the authors conclude that advanced methods are better suited for separating a given dataset, rather than to perform data-mining experiments.Ahmadi, Gupta, Menon, and Baudry (2025) developed a machine-learning protocol that uses pharmacophore features to separate true binding ligands from decoys for four protein targets. They first used molecular dynamics simulation to generate pharmacophore feature sets from protein-ligand complex conformations. Then, they applied AI/ML algorithms to reduce the whole set of those features to a much smaller set. They showed that this protocol is effective for true binder prediction while remaining medicinal-chemistry friendly.The papers published in this special issue focus on leveraging machine learning to analyze biological models, predict molecular behaviors, and aid in drug discovery. They incorporate ML into diverse applications, such as peptide-MHC binding prediction, protein-ligand interaction prediction, antimicrobial peptide classification, and antibody clustering. while also demonstrating how these protocols can identify both small-molecule and antibody binders, providing meaningful biological insights.
Keywords: machine learning (ML), computer aided drug design, Protein Binding, Hydration sites, antimicrobial activity, QSAR, antibody grouping, Pharmacophores
Received: 29 Jan 2025; Accepted: 03 Feb 2025.
Copyright: © 2025 Perisic, Sevim Bayrak and Gunady. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Ognjen Perisic, Big Blue Genomics / Redesign Science (New York, US), Belgrade, Serbia
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.