Skip to main content

ORIGINAL RESEARCH article

Front. Chem., 17 January 2022
Sec. Theoretical and Computational Chemistry
This article is part of the Research Topic In Silico Study - Repurposing of Drugs View all 4 articles

Accurate Prediction of Inhibitor Binding to HIV-1 Protease Using CANDOCK

  • 1Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, State University of New York, Buffalo, NY, United States
  • 2Department of Chemistry, Purdue University, West Lafayette, IN, United States
  • 3Purdue Institute for Drug Discovery, West Lafayette, IN, United States
  • 4Purdue Center for Cancer Research, West Lafayette, IN, United States
  • 5Purdue Institute for Inflammation, Immunology and Infectious Disease, West Lafayette, IN, United States
  • 6Purdue Institute for Integrative Neuroscience, West Lafayette, IN, United States

The human immunodeficiency virus 1 (HIV-1) protease is an important target for treating HIV infection. Our goal was to benchmark a novel molecular docking protocol and determine its effectiveness as a therapeutic repurposing tool by predicting inhibitor potency to this target. To accomplish this, we predicted the relative binding scores of various inhibitors of the protease using CANDOCK, a hierarchical fragment-based docking protocol with a knowledge-based scoring function. We first used a set of 30 HIV-1 protease complexes as an initial benchmark to optimize the parameters for CANDOCK. We then compared the results from CANDOCK to two other popular molecular docking protocols Autodock Vina and Smina. Our results showed that CANDOCK is superior to both of these protocols in terms of correlating predicted binding scores to experimental binding affinities with a Pearson coefficient of 0.62 compared to 0.48 and 0.49 for Vina and Smina, respectively. We further leveraged the Database of Useful Decoys: Enhanced (DUD-E) HIV protease set to ascertain the effectiveness of each protocol in discriminating active versus decoy ligands for proteases. CANDOCK again displayed better efficacy over the other commonly used molecular docking protocols with area under the receiver operating characteristic curve (AUROC) of 0.94 compared to 0.71 and 0.74 for Vina and Smina. These findings support the utility of CANDOCK to help discover novel therapeutics that effectively inhibit HIV-1 and possibly other retroviral proteases.

1 Introduction

Inhibition of the Human immunodeficiency virus (HIV) protease blocks viral maturation and replication, making inhibitors of this vital enzyme an important class of drugs for the treatment of HIV infection (Wlodawer et al., 1989; Wlodawer and Vondrasek, 1998). The introduction of HIV protease inhibitors reduced the mortality rate of infected patients in the US significantly, from about 50 thousand deaths per year in 1995, down to 20 thousand by 2000 (Centers for Disease Control and Prevention, 2001a; Centers for Disease Control and Prevention, 2001b; Quinn, 2008). However, escape mutations within the viral protease have resulted in HIV strains that are resistant to these inhibitors, presenting a challenge to identify which protease inhibitors are effective against specific mutants or discover and design broad spectrum inhibitors (Ohtaka and Freire, 2005). For these reasons, accurate prediction of inhibitor efficacy against protease mutants that arise during infection has been a focus of HIV drug discovery for decades.

Previous efforts to predict inhibitor activity against human immunodeficiency virus 1 (HIV-1) proteases include rule-based methods (Shafer et al., 1999; Kantor et al., 2001), support vector machine (SVM) models (Beerenwinkel et al., 2002; Cai et al., 2003), chemical shape and features (Yadav et al., 2012; Pandit et al., 2006; DesJarlais and Dixon, 1994; Wei et al., 2015), various docking protocols (Chang et al., 2007, 2010), and molecular dynamics (MD) simulations (Rick et al., 1998; Wang and Kollman, 2001; Wang K. et al., 2004; Jenwitheesuk and Samudrala, 2003; Jenwitheesuk et al., 2004; Jenwitheesuk and Samudrala, 2005a; Jenwitheesuk and Samudrala, 2005b). These different approaches have displayed varying results. Chang et al. compared the efficacy of Autodock 4 and Autodock Vina in predicting active versus inactive compounds using the National Cancer Institute Diversity II compound sets and showed that both protocols perform better than random [area under the curve (AUC) of 0.69 and 0.68, respectively] on this diverse compound set (Huang et al., 2006; Chang et al., 2010). Pandit et al. generated a pharmacophore model using Molecular Operating Environment (MOE) software to predict activity from a set of known protease inhibitors and non-inhibitors, correctly identifying 65 of the 75 protease inhibitors and incorrectly classifying 11 out of 75 non-inhibitors. When volume exclusion was incorporated into the model, the authors were able to decrease the number of false positives to 5 out of 75 while reducing the true positives to 60 out of 75. (Pandit et al., 2006). These results display the tradeoff between sensitivity and specificity and the limitations of this model.

The problem is more difficult in a de novo molecular docking scenario, when the x-ray diffraction structure of a protease mutant (or homology model) is docked to an inhibitor and the necessary rotations and transformations are calculated without a priori knowledge (Razzaghi-Asl et al., 2015). In contrast, approaches that use x-ray diffraction structure poses for the protease and inhibitor, combined with the features of the compound and/or MD, to predict binding affinities have met with some success (Jenwitheesuk et al., 2004; Jenwitheesuk and Samudrala, 2005a; Jenwitheesuk and Samudrala, 2005b). Jenwitheesuk and Samudrala obtained a peak correlation of 0.87 between predicted binding energies and experimental binding affinities through the use of MD simulations (Jenwitheesuk and Samudrala, 2003). These MD approaches have the advantages of allowing for the exploration of the inhibitor within the binding site and using scoring functions that are sensitive to amino acid mutations in the protease structure. However, MD requires an accurate 3D complex structure of the bound pose between ligand and protein as a starting point, thus limiting this method by requiring solved or modeled structure(s) and/or a docking protocol. Jenwitheesuk and Samudrala showed that calculated versus experimental binding correlation was 0.38 with the docking protocol Autodock alone, illustrating the benefit MD provides (Jenwitheesuk and Samudrala, 2003). Leveraging docking protocols with more sophisticated and robust scoring functions can be used to identify the pose of the inhibitors with respect to the protease as well as accurately predict the corresponding binding scores.

Here we used a hierarchical fragment-based docking based dynamics protocol implemented in the CANDOCK protocol (Fine and Chopra, 2018; Fine et al., 2020), in conjunction with its all-atom knowledge-based scoring function (Bernard and Samudrala, 2009), to predict the binding scores of inhibitors to HIV-1 protease. The knowledge-based scoring function calculates and optimizes atomic interactions in the binding pocket to sample biologically relevant ligand conformations giving us an ability to identify specific interactions (beyond hydrogen bonding or pi-stacking, etc.) in protein binding sites. We first optimized the parameters for the knowledge-based scoring function used in CANDOCK using a set of HIV-1 protease–inhibitor complex structures with known binding affinities from BindingMOAD (Smith et al., 2019). We then used CANDOCK to predict actives versus decoys from the Directory of Useful Decoys Enhanced (DUD-E) HIV protease subset (Mysinger et al., 2012) to affirm the discriminatory abilities of the improved protocol and compared it to two popular molecular docking methods, AutoDock Vina and Smina. Our research showed a strong correlation between experimental binding affinities and predicted binding scores for CANDOCK which resulted in a 0.62 Pearson coefficient compared to the 0.48 and 0.49 for Vina and Smina, respectively. In addition, the performance of CANDOCK on the DUD-E HIV protease set exceeded Vina and Smina with an area under the receiver operating characteristic curve (AUROC) of 0.94 for CANDOCK compared to 0.71 for Vina and 0.74 for Smina. These results demonstrate the predictive power of CANDOCK for the specific case of assessing inhibitor potency against HIV-1 protease.

2 Materials and Methods

2.1 Curation of Human Immunodeficiency Virus 1 Protease–Inhibitor Sets for Benchmarking

We compiled a set of 30 HIV-1 protease–inhibitor complex structures (Table 1) extracted from the Protein Data Bank (PDB), all of which were solved using x-ray diffraction with a resolution of 2.5 Å or lower and have experimentally determined binding affinities between the protease and given inhibitor (Berman et al., 2000). To use these structures in our docking simulations, we had to separate and process the protease and inhibitor in each complex. First, the proteases were processed using biopython and OpenMM to remove any co-crystallized ligands so the HIV-1 protease dimer was all that remained to be used as the receptor in each molecular docking simulation (Cock et al., 2009; Eastman et al., 2017). The inhibitor was also separated from the corresponding complex in each case and converted to a Mol2 file format using OpenBabel for compatibility with the CANDOCK protocol (O'Boyle et al., 2011, 2008). Both the protease and inhibitor files for each complex were also converted to the PDBQT file format using AutoDocktools for compatibility with the AutoDock Vina protocol (Morris et al., 2009). The binding sites for each protease were defined using the coordinates of the native ligand in the PDB structure. Experimentally observed inhibition constants, Ki, for the bound ligands in all 30 complexes were obtained from BindingMOAD (Smith et al., 2019).

TABLE 1
www.frontiersin.org

TABLE 1. List of HIV-1 protease structures extracted from Protein DataBase (PDB) with experimentally determined inhibition constants (Ki) extracted from BindingMOAD.

In addition to the 30 HIV-1 protease–inhibitor complexes, we also extracted a set of active and decoy compounds for the HIV protease from the DUD-E which consisted of 1,395 actives and 36,278 decoys against the macromolecule dimer (Mysinger et al., 2012).

Lastly, a set of 14 compounds were extracted from PubChem, seven of which were experimentally determined to be active HIV-1 protease inhibitors and the other seven were determined to be inactive compounds. The list of compounds and their corresponding activity against HIV-1 protease is given in Table 2.

TABLE 2
www.frontiersin.org

TABLE 2. Compounds extracted from PubChem with experimentally determined inhibition activity against HIV-1 protease.

2.2 Molecular Docking Protocols

Three different molecular docking protocols: CANDOCK (Fine and Chopra, 2018; Fine et al., 2020), Autodock Vina (Trott and Olson, 2010), and Smina (Koes et al., 2013) were used herein to predict the binding affinity between the HIV-1 protease inhibitors and the protease macromolecule binding sites.

2.2.1 CANDOCK

CANDOCK is a hierarchical fragment-based docking with dynamics protocol to “grow” the ligand in the binding pocket by leveraging a generalized knowledge-based statistical scoring function to identify docked poses (Bernard and Samudrala, 2009; Fine and Chopra, 2018). The fragment-based approach identifies and breaks rotatable bonds in the ligand to reduce the ligand down to rigid subunits, which are then individually docked in the binding pocket and relinked to build the complete original ligand in the best pose. This method enables comprehensive search of both the binding pocket and the conformational space of ligand. The scoring parameters for the knowledge-based scoring function were varied during this study and are discussed in Section 2.3. The protease–inhibitor binding site was defined as a sum of centroids, with 4.5 Å radii, at each atom of the known bound ligand for each complex. We also used the Generalized Amber ForceField (GAFF) implemented in CANDOCK as a control for the knowledge-based scoring function.

2.2.2 AutoDock Vina

AutoDock Vina, referred to here as Vina, is a well-known molecular docking protocol that uses a physics-based forcefield, similar to X-Score, that is tuned on the experimental data in PDBBind (Wang et al., 2002; Trott and Olson, 2010; Su et al., 2018). In this study, Vina was used with default parameters with the following exceptions: the exhaustiveness and num_modes parameters set to 8 and 9, respectively. The binding box center was placed at the geometric center of the known bound ligand for each corresponding crystal structure. The length of all sides of the binding box were defined as two times the radius of gyration of the compound plus 9.0 Å, to ensure a large enough search space while simultaneously mimicking the binding site centroids used in CANDOCK.

2.2.3 Smina

Smina is a forked version of Vina with expanded functionality and enabled user-defined scoring functions (Koes et al., 2013). In addition, Koes et al. modified the existing Vina potential by optimizing additional energetic terms found in the original code that were not used; i.e., the coefficients were set to 0. The potential terms included an electrostatic term, a desolvation term, and a non-hydrophobic contact term, among others that were all parameterized by training on the CSAR (Community Structure-Activity Resource) 2010 dataset (Dunbar Jr et al., 2011). The parameters used for Vina, described above, were also used here. The binding box was also defined identically.

2.3 Selector and Ranker

The CANDOCK protocol generates hundreds to thousands of binding modes for a protein–ligand pair that may then be scored. To obtain a single score, the protocol orders all of these modes by score using a knowledge-based forcefield (KBF) and chooses the top result as the best binding pose and corresponding binding score. We refer to this scoring and sorting as the selector. Once the top binding pose is chosen, we rescore this pose using the same or different variation of the KBF, by modifying one or more of the terms. The new score is then used in the calculation of the correlation between known and predicted binding affinities. We call the parameter set used for rescoring the ranker (Fine et al., 2020).

The KBF is an atomic level forcefield that is generalized to all intermolecular binding interactions, e.g., protein–small molecule, protein–DNA, etc. (Bernard and Samudrala, 2009). The equation to calculate the interaction score between two molecules, Eq. 1, is analogous to the net potential of mean force:

Srabij=lnPrabij|CPrij(1)

This equation calculates the probability of two intermolecular atoms being within a given distance (P(rabij|C)) with respect to the probability of any two atoms being within the same distance (P(rij)), where rabij is the distance between atom i of type a and j of type b. For this function the distances, r, are discretized into distinct spherical shells. The KBF has four terms that affect the function and are optimized herein. The four terms are functional, reference, composition, and cutoff.

The probability distributions P(rabij|C) and P(rij) are generated by one of two functionals: a normalized frequency distribution function (f), or a radial distribution function (r). The former assesses how many atoms are within a given shell, whereas the latter divides the number of atoms in the shell by the volume of the spherical shell.

The reference probability is calculated one of two ways: cumulative (c) or mean (m), which are the sum over all atom pairs or averaged over the number of atom pairs, respectively.

The last two terms, composition and cutoff, refer to the atom type pairs and the maximum distance (r) considered for calculations. The two options for composition are complete (c), which enables the use of all atom type pairs, and reduced (r), which limits the summation to only atom types a and b found within the given intermolecular complex. Cutoff can range from 4 to 15 Å.

The probabilities P(rabij|C) and P(rij) were generated using the experimental data found in the Cambridge Structural Database (CSD) for both the protein–ligand and protein–DNA interactions (Allen, 2002).

The “default” parameter set for CANDOCK was defined as rmr6–rmr6, which equates to radial functional, mean reference, reduced composition, and 6 Å cutoff for both the selector and ranker.

3 Results and Discussion

3.1 Comprehensive Analysis of Parameters for the CANDOCK Knowledge-Based Scoring Function

The CANDOCK protocol uses a generalized statistical scoring function for scoring molecular interactions. This KBF is implemented with four different parameters: functional, reference, composition, and cutoff (Section 2.3). Each of the first three parameters have two options, and cutoff can range from 4 to 15 Å, resulting in 96 variations of the KBF. We generated the top poses for each of the 30 HIV-1 protease–inhibitor complexes using the 96 KBF variations for both the selector and ranker to ascertain the optimal combination.

We calculated top poses and corresponding binding scores for the 30 HIV-1 protease–inhibitor complexes using all 96 variations of the KBF (selector) and subsequently rescored the top pose for all complexes with each of the 96 KBF variations (ranker), resulting in 9,216 different selector–ranker combinations. For each selector–ranker pair, we calculated the Pearson and Spearman correlation between the known Ki from BindingMOAD and the predicted binding scores for all 30 protease–inhibitor pairs. The correlations calculated showed us how accurately each selector–ranker parameter set predicted binding scores when compared to the experimental binding affinities. We populated a heatmap with all of the calculated correlations presented in Figure 1. This visualization showed us that rmr and fmr with all cutoffs were very high-performing rankers, where variation of the selector shows negligible effect on the correlations. These results were interesting from a high-level perspective by showing the efficacy of the fmr and rmr rankers, but we wanted to assess specific parameter sets to determine which would be the most accurate for the HIV-1 protease–inhibitor complexes.

FIGURE 1
www.frontiersin.org

FIGURE 1. Heatmaps of Pearson and Spearman correlations between predicted binding scores and experimental binding affinities for all selector and ranker parameter sets in CANDOCK. The Pearson (A) and Spearman (B) correlations were calculated using 30 HIV-1 protease–inhibitor complexes where each top pose was scored using a specific selector and then rescored using a specific ranker. This analysis was performed for all 96 variations of each ranker/selector pair resulting in 9,216 correlations plotted in each heatmap—lighter orange color indicating stronger positive correlation. The functional, reference, and composition parameters are denoted on the major ticks of the horizontal and vertical axes. The minor ticks for each parameter set account for the 12 different cutoffs that can be used (4–15 Å). For both the Pearson and Spearman heatmaps, fmr and rmr rankers result in high correlations regardless of the selector used, demonstrating that the ranker chosen is more important than the selector used. The heatmaps also show that the rmr parameter set, which is the default for CANDOCK, is a strong performing ranker.

The CANDOCK protocol was previously parameterized on the complete CASF-2016, containing 285 protein–ligand complexes across 57 proteins (Fine and Chopra, 2018; Su et al., 2018; Fine et al., 2020). Results from this analysis determined rmr6 as selector and rmc15 as ranker were the best performing parameter sets, and varying the selector did not have a major impact on the resulting correlations between binding affinity and binding score. The authors also commented on the results for the subset of HIV protease–inhibitor complexes in CASF-2016 stating that rmc15–rmr6, the reverse of the previously stated selector–ranker, was the best performing set. We used the knowledge obtained in this earlier study along with our own analysis of the heatmap of all correlations (Figure 1) to justify rmr6–rmr6 as our “default” selector–ranker because it is the default values set in the CANDOCK program, the rmr6 ranker was previously shown to be the best for the HIV protease set from CASF-2016, and the ranker parameters are more important than the selector parameters for the accuracy of the method.

We analyzed our HIV-1 protease results by comparing the results of the default parameter set, rmr6–rmr6, to the best performing parameter set, fmr12–rmc5. Figure 2 shows that the best performing parameter set, fmr12–rmc5, yielded Pearson and Spearman correlations of 0.71 (p-value < 0.0001) and 0.67 (p-value < 0.0001), respectively, for the 30 HIV-1 protease–inhibitor complexes. The best performing parameter set marginally outperforms the default parameter set for CANDOCK, rmr6–rmr6, with Pearson and Spearman correlations of 0.62 (p-value < 0.001) and 0.50 (p-value < 0.01), respectively.

FIGURE 2
www.frontiersin.org

FIGURE 2. Distributions of Pearson (A) and Spearman (B) correlation coefficients between predicted and experimental binding affinities across all variations of the knowledge-based forcefield used by CANDOCK. The frequencies of the Pearson (A) and Spearman (B) correlation coefficients are plotted with the default and best performing parameter set coefficients denoted by blue and orange lines, respectively. The default parameter set (rmr6–rmr6) results in coefficients comparable to those resulting from the optimal parameter set (fmr12–rmc5) especially with regard to the Pearson correlation. These data support the use of the default and unbiased parameter set for the specific case of HIV-1 protease inhibitor binding prediction.

Both CANDOCK parameter sets were in the top 1% of Pearson scores, suggesting that the previously established default parameter set for assessing the ability of CANDOCK to accurately calculate the relative binding strength of HIV-1 protease–inhibitor complexes performs near optimally (Figure 2). This led us to continue further analysis with this unbiased parameter set rather than using the best one, thereby eliminating the risk of any overtraining.

3.2 Comparison to Other Docking Methods and Forcefields

We compared CANDOCK to two other well-established molecular docking protocols to determine their relative utility (Figure 3). For this comparison, we used protocols based on the Vina and Smina software (Morris et al., 2009; Trott and Olson, 2010). In addition, we also ran the predictions made by CANDOCK using a physics-based forcefield (CANDOCK-physics), GAFF, as a control (Wang et al., 2004a; Wang et al., 2006).

FIGURE 3
www.frontiersin.org

FIGURE 3. Comparison of predicted binding scores and known constants for 30 HIV-1 protease–inhibitor complex structures using four different docking protocols. Each panel plots the protein–ligand predicted binding scores from a particular docking protocol against the known binding constant [ln(Ki)] for each HIV-1 protease–inhibitor complex with the linear regression line and the 95% confidence interval shaded. CANDOCK with the default parameters for the knowledge-based forcefield (blue) produced a Pearson correlation of 0.62 (p-value <0.001), whereas CANDOCK with physics-based potential (orange), AutoDock Vina (green), and Smina (red) all had lower correlations of 0.07 (p-value = 0.7365), 0.48 (p-value = 0.0076), and 0.49 (p-value = 0.0061), respectively. These results illustrate the higher utility of CANDOCK with default parameters for predicting HIV-1 protease–inhibitor binding with respect to the other docking protocols and scoring functions.

Figure 3 shows that Vina, which uses a scoring function involving knowledge-based and empirical components, generated Pearson and Spearman correlations of 0.48 and 0.50, respectively. Smina, a modified Vina with a custom scoring potential, performed similarly with correlations of 0.49 and 0.49 for Pearson and Spearman, respectively. Recall that the respective correlations were 0.62 and 0.50 for CANDOCK using the default KBF parameter set and 0.71 and 0.67 using the best performing one. Lastly, CANDOCK-physics was a control that resulted in 0.06 Pearson and 0.09 Spearman correlations. The physics-based potentials used to predict protein–ligand interactions showed little to no correlation to known binding affinities for the HIV-1 protease–inhibitor set.

These results combined indicate that CANDOCK-default is able to accurately predict the relative binding affinities with greater confidence than other docking protocols.

3.3 Discrimination of Active Versus Decoy Human Immunodeficiency Virus 1 Protease Inhibitors

We next assessed the discriminatory ability of CANDOCK to effectively identify active inhibitors of HIV-1 protease over decoys. We again compared the results of CANDOCK with default KBF parameters to Vina, Smina, and CANDOCK-physics. To accomplish this, we ran docking simulations using all four protocols on a set of active and decoy HIV-1 protease inhibitors from DUD-E (Mysinger et al., 2012).

For each protocol, the receiver operating characteristic (ROC) curve was generated based upon the sorted binding scores the protocol calculated for each active/decoy–protein pair; additionally, the AUROC was calculated for each protocol (Figure 4). The resulting AUROC values for all protocols were above 0.5, meaning they all performed better than random. CANDOCK with default KBF parameters performed the best with an AUROC of 0.94. The other three methods had much lower AUROC values with CANDOCK-physics at 0.67, Vina at 0.71, and Smina at 0.74. While all protocols performed well, CANDOCK-default displayed its superiority for binding score prediction.

FIGURE 4
www.frontiersin.org

FIGURE 4. Receiver operating characteristic (ROC) and precision–recall (PR) curves for the HIV protease DUD-E set using four different docking protocols. The resulting area under the ROC (AUROC) curve values on the (A) are 0.94 for CANDOCK with the default KBF parameters (blue), 0.67 for CANDOCK with physics-based potential (orange), 0.71 for Vina (green), and 0.74 for Smina (red). The resulting area under the precision–recall curves (AUPRC) values on the (B) are 0.41 for CANDOCK-default, 0.18 for CANDOCK-physics, 0.08 for Vina, and 0.09 for Smina. These results provide further evidence that CANDOCK with its default KBF parameters outperforms the other docking protocols and scoring functions for prediction of binding affinity and activity of HIV-1 protease inhibitors.

We plotted the precision–recall curves for each protocol and calculated the area under these curves (AUPRC) to more thoroughly assess the protocols in discriminating active from decoy inhibitors (Figure 4). All four protocols scored under 0.5, which is common when there are imbalanced class sizes like in the DUD-E set. CANDOCK-default still outperformed the other three protocols with a AUPRC of 0.41. The remaining three protocols, CANDOCK-physics, Vina, and Smina, had AUPRCs of 0.18, 0.08, and 0.09, respectively.

These results show that in addition to CANDOCK predicting binding scores that correlate well with known binding affinities, the protocol is also capable of effectively choosing active inhibitors over decoys.

3.4 Discrimination of Known Active Versus Inactive Human Immunodeficiency Virus 1 Protease Inhibitors

To further investigate the efficacy of the CANDOCK protocol, we assessed its discrimination capability between known active HIV-1 protease inhibitors and known inactive compounds. This benchmark is similar to the DUD-E set; however, the inactive compounds are experimentally confirmed, as opposed to decoys that are generated based on chemical similarity to the active compounds. This provides a more robust test for CANDOCK to be compared to the other methods in their ability to identify HIV-1 protease inhibitors.

For each protocol, we compared the resulting binding scores for each active and inactive compound (Figure 5) and visualized their separation based upon the strength of binding (active compounds should be in the lower left of the plot and the inactives should be in the upper right). CANDOCK-default and Vina distinguish the actives versus inactives clearly, while they group more closely for CANDOCK-physics and Smina.

FIGURE 5
www.frontiersin.org

FIGURE 5. Predicted binding scores for HIV-1 protease inhibitors and inactive compounds using four different docking protocols. Each panel plots the protein–ligand predicted binding scores from a particular docking protocol for each of the seven active (blue) and seven inactive (orange) compounds. These results depict a distinct separation in binding scores for actives versus inactives using CANDOCK with default parameters and Vina.

To more explicitly determine the discriminatory ability, we subsequently generated the ROC curve based on the sorted binding scores the protocol calculated for each active/inactive–protein pair; additionally, the AUROC was calculated for each protocol (Figure 6). CANDOCK with default KBF parameters and Vina both performed the best with an AUROC of 1.00. The other two methods had lower AUROC values with CANDOCK-physics at 0.98 and Smina at 0.92. While all protocols performed well, CANDOCK-default and Vina displayed comparable efficacy for active versus inactive discrimination.

FIGURE 6
www.frontiersin.org

FIGURE 6. Receiver operating characteristic (ROC) and precision–recall (PR) curves for the HIV-1 protease actives and inactives (seven each). The resulting area under the ROC (AUROC) curve values on the (A) are 1.00 for CANDOCK with the default KBF parameters (blue), 0.98 for CANDOCK with physics-based potential (orange), 1.00 for Vina (green), and 0.92 for Smina (red). The resulting area under the precision–recall curves (AUPRC) values on the (B) are 1.00 for CANDOCK-default, 0.98 for CANDOCK-physics, 1.00 for Vina, and 0.95 for Smina. These results provide further evidence that CANDOCK with its default KBF parameters is an effective docking protocol and scoring function for prediction of binding affinity and activity of HIV-1 protease inhibitors.

We plotted the precision–recall curves for each protocol and calculated the AUPRC to more thoroughly assess the protocols in discriminating actives from inactives (Figure 6). CANDOCK-default and Vina still outperformed the other two protocols with a AUPRC of 1.00, compared to AUPRC of 0.98 for CANDOCK-physics and 0.95 for Smina.

Overall, CANDOCK-default is effective at predicting active inhibitors for HIV-1 protease from a set of actives and inactives. Vina showed comparable results for this test, and both slightly outperformed Smina and CANDOCK-physics.

3.5 Comparison of Docking Simulation Times

As another point of comparison for all four protocols, we assessed the average time it takes to generate the top pose for the HIV-1 protease–inhibitor complex. We ran the 30 protease–inhibitor complexes 30 times resulting in 900 simulations for each protocol. We then averaged the time of simulation, in seconds, for all 900 simulations and calculated the averages for each protocol (Figure 7). The results show that CANDOCK takes much longer to complete a single simulation, despite the scoring function with CANDOCK-default averaging 3,751 s and CANDOCK-physics averaging 4,055 s. Vina and Smina are comparable to each other and much faster than CANDOCK, with an average time for simulation of 169 and 87 s, respectively.

FIGURE 7
www.frontiersin.org

FIGURE 7. Average simulation time for four different docking protocols. The average times for the four protocols over 900 simulations are as follows: CANDOCK-default (blue) = 3,751 s, CANDOCK-physics (orange) = 4,055 s, Vina (green) = 169 s, and Smina (red) = 87 s. CANDOCK is slower than Vina and Smina, resulting in a tradeoff between accuracy and computational cost.

Balancing the computational cost with the accuracy of the protocol is important to consider, especially if the compound sets being tested are very large (>1,000) or the available computing power is limited.

3.6 Limitations and Future Work

The size of the set of HIV-1 protease–inhibitor complexes used for the parameterization and correlation calculations is limited to only 30 members. A larger set would enable better parameterization and assessment of the protocols by providing more variability in the chemical space of the ligands. Moreover, a larger set of known active and inactive compounds would allow for a much more rigorous comparison of these methods in their ability to discriminate active inhibitors from inactive compounds.

The datasets used herein all focused on the wildtype HIV-1 protease, but the prevalence of mutations that confer drug resistance is of great concern in HIV treatment. Future work will assess the sensitivity of CANDOCK and the other protocols to inhibitors of HIV-1 protease mutations. Since CANDOCK has already been shown to be effective in predicting and designing specific inhibitors, demonstrating the efficacy of the CANDOCK protocol in the prediction of binding affinities to any protease mutant will greatly aid in future drug design efforts for effective, broad-spectrum protease inhibitors to replace the cocktails that are currently used (Larocque et al., 2017; Ma et al., 2017). Moreover, if our results were generalizable to predict accurate binding between HIV-1 protease mutants and their inhibitors, it would allow for a precision medicine approach to protease inhibitor efficacy prediction (Jenwitheesuk and Samudrala, 2003; Jenwitheesuk et al., 2004; Wang et al., 2004b; Jenwitheesuk and Samudrala, 2005a;2005b; Jenwitheesuk et al., 2005; Jenwitheesuk and Samudrala, 2007; Jenwitheesuk et al., 2008).

Leveraging some of our recently developed programs, based on machine learning and graph neural networks, we can iteratively select synthetically feasible bioactive protease inhibitors based on bioactivity data and CANDOCK-generated pose of molecules (Majumder et al., 2018; Wijewardhane et al., 2020). Exploration of CANDOCK efficacy on other HIV-1 targets, such as reverse transcriptase, would enable proteomic-based drug discovery, which we have shown to be useful for drug repurposing, and could lead to more potent HIV-1 therapeutics (Chopra et al., 2016; Chopra and Samudrala, 2016; Hernandez-Perez et al., 2017; Majumder et al., 2017; Fine et al., 2019; Mangione et al., 2020; Robertson et al., 2020).

4 Conclusion

We evaluated four different docking protocols for their effectiveness in predicting HIV-1 protease–inhibitor binding affinities. We assessed these protocols by correlation to known binding affinities and by ability to discriminate between known active and decoy inhibitors. The results from both these computational experiments showed that the CANDOCK protocol with its all-atom knowledge-based forcefield was superior to these other protocols.

Overall, we show that CANDOCK accurately predicts the relative binding affinities for HIV-1 protease inhibitors when compared to and greatly outperforms popular publicly available molecular docking protocols. The efficacy demonstrated by CANDOCK in our study indicates that it will be very useful for the repurposing, discovery, and design of novel HIV-1 protease inhibitors.

Data Availability Statement

The data generated for this study can be found at http://compbio.buffalo.edu/data/in_hiv_candock.

Author Contributions

ZF and RS conceived the research design and methods for this study. ZF implemented the methods, executed the research design, and drafted the manuscript. RS, GC, and JF edited and refined the manuscript. RS supervised the overall study. JF, GC, and RS are the co-creators of the CANDOCK protocol. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by a 2010 NIH Director's Pioneer Award (1DP1OD006779), NIH Clinical and Translational Sciences Award (UL1TR001412), NLM T15 Award (T15LM012495), NCI/VA BD-STEP Fellowship in Big Data Sciences, and startup funds from the Department of Biomedical Informatics at the University at Buffalo. This work was also supported by 2019 NIH NCATS ASPIRE Challenge Awards and 2020 NIH NCATS ASPIRE Reduction-to-Practice Awards. Additional support in part by an NCATS Clinical and Translational Sciences Award from the Indiana Clinical and Translational Sciences Institute (UL1TR002529) and the Purdue University Center for Cancer Research NIH grant P30 (CA023168) is acknowledged.

Conflict of Interest

GC is the Director of Merck-Purdue Center for Measurement Science funded by Merck Sharp Dohme Corp., a subsidiary of Merck Co. Inc., Kenilworth, NJ, United States.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We are grateful for the computational support provided by the University at Buffalo's Center for Computational Research.

References

Allen, F. H. (2002). The Cambridge Structural Database: a Quarter of a Million crystal Structures and Rising. Acta Crystallogr. Sect B 58, 380–388. doi:10.1107/s0108768102003890

PubMed Abstract | CrossRef Full Text | Google Scholar

Beerenwinkel, N., Schmidt, B., Walter, H., Kaiser, R., Lengauer, T., Hoffmann, D., et al. (2002). Diversity and Complexity of Hiv-1 Drug Resistance: a Bioinformatics Approach to Predicting Phenotype from Genotype. Proc. Natl. Acad. Sci. 99, 8271–8276. doi:10.1073/pnas.112177799

PubMed Abstract | CrossRef Full Text | Google Scholar

Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., et al. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242. doi:10.1093/nar/28.1.235

PubMed Abstract | CrossRef Full Text | Google Scholar

Bernard, B., and Samudrala, R. (2009). A Generalized Knowledge-Based Discriminatory Function for Biomolecular Interactions. Proteins: Struct. Funct. Bioinf. 76, 115–128. doi:10.1002/prot.22323

PubMed Abstract | CrossRef Full Text | Google Scholar

Cai, C., Han, L., Ji, Z. L., Chen, X., and Chen, Y. Z. (2003). Svm-prot: Web-Based Support Vector Machine Software for Functional Classification of a Protein from its Primary Sequence. Nucleic Acids Res. 31, 3692–3697. doi:10.1093/nar/gkg600

PubMed Abstract | CrossRef Full Text | Google Scholar

Centers for Disease Control and Prevention (2001a). HIV and AIDS--United States, 1981-2000. MMWR Morb Mortal Wkly Rep. 50, 430–434.

PubMed Abstract | Google Scholar

Centers for Disease Control and Prevention (2001b). Hiv/aids Surveillance—General Epidemiology. Available at: https://www.cdc.gov/hiv/pdf/library/reports/surveillance/cdc-hiv-surveillance-report-2001-vol-13-2.pdf.

Google Scholar

Chang, M. W., Lindstrom, W., Olson, A. J., and Belew, R. K. (2007). Analysis of Hiv Wild-type and Mutant Structures via In Silico Docking against Diverse Ligand Libraries. J. Chem. Inf. Model. 47, 1258–1262. doi:10.1021/ci700044s

CrossRef Full Text | Google Scholar

Chang, M. W., Ayeni, C., Breuer, S., and Torbett, B. E. (2010). Virtual Screening for Hiv Protease Inhibitors: a Comparison of Autodock 4 and Vina. PloS one 5, e11955. doi:10.1371/journal.pone.0011955

PubMed Abstract | CrossRef Full Text | Google Scholar

Chopra, G., and Samudrala, R. (2016). Exploring Polypharmacology in Drug Discovery and Repurposing Using the cando Platform. Curr. Pharm. Des. 22, 3109–3123. doi:10.2174/1381612822666160325121943

PubMed Abstract | CrossRef Full Text | Google Scholar

Chopra, G., Kaushik, S., Elkin, P. L., and Samudrala, R. (2016). Combating Ebola with Repurposed Therapeutics Using the cando Platform. Molecules 21, 1537. doi:10.3390/molecules21121537

PubMed Abstract | CrossRef Full Text | Google Scholar

Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., et al. (2009). Biopython: Freely Available python Tools for Computational Molecular Biology and Bioinformatics. Bioinformatics 25, 1422–1423. doi:10.1093/bioinformatics/btp163

PubMed Abstract | CrossRef Full Text | Google Scholar

DesJarlais, R. L., and Dixon, J. S. (1994). A Shape-And Chemistry-Based Docking Method and its Use in the Design of Hiv-1 Protease Inhibitors. J. computer-aided Mol. Des. 8, 231–242. doi:10.1007/bf00126742

CrossRef Full Text | Google Scholar

Dunbar, J. B., Smith, R. D., Yang, C.-Y., Ung, P. M.-U., Lexa, K. W., Khazanov, N. A., et al. (2011). Csar Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes. J. Chem. Inf. Model. 51, 2036–2046. doi:10.1021/ci200082t

CrossRef Full Text | Google Scholar

Eastman, P., Swails, J., Chodera, J. D., McGibbon, R. T., Zhao, Y., Beauchamp, K. A., et al. (2017). Openmm 7: Rapid Development of High Performance Algorithms for Molecular Dynamics. PLoS Comput. Biol. 13, e1005659. doi:10.1371/journal.pcbi.1005659

PubMed Abstract | CrossRef Full Text | Google Scholar

Fine, J. A., and Chopra, G. (2018). Candock: Conformational Entropy Driven Analytics for Class-specific Proteome-wide Docking. Biophys. J. 114, 57a. doi:10.1016/j.bpj.2017.11.363

CrossRef Full Text | Google Scholar

Fine, J., Lackner, R., Samudrala, R., and Chopra, G. (2019). Computational Chemoproteomics to Understand the Role of Selected Psychoactives in Treating Mental Health Indications. Sci. Rep. 9, 1–15. doi:10.1016/b978-0-12-812962-3.00001-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Fine, J., Konc, J., Samudrala, R., and Chopra, G. (2020). Candock: Chemical Atomic Network-Based Hierarchical Flexible Docking Algorithm Using Generalized Statistical Potentials. J. Chem. Inf. Model. 60, 1509–1527. doi:10.1021/acs.jcim.9b00686

PubMed Abstract | CrossRef Full Text | Google Scholar

Hernandez-Perez, M., Chopra, G., Fine, J., Conteh, A. M., Anderson, R. M., Linnemann, A. K., et al. (2017). Inhibition of 12/15-lipoxygenase Protects against β-cell Oxidative Stress and Glycemic Deterioration in Mouse Models of Type 1 Diabetes. Diabetes 66, 2875–2887. doi:10.2337/db17-0215

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, N., Shoichet, B. K., and Irwin, J. J. (2006). Benchmarking Sets for Molecular Docking. J. Med. Chem. 49, 6789–6801. doi:10.1021/jm0608356

CrossRef Full Text | Google Scholar

Jenwitheesuk, E., and Samudrala, R. (2003). Improved Prediction of Hiv-1 Protease-Inhibitor Binding Energies by Molecular Dynamics Simulations. BMC Struct. Biol. 3, 2. doi:10.1186/1472-6807-3-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Jenwitheesuk, E., and Samudrala, R. (2005a). Prediction of Hiv-1 Protease Inhibitor Resistance Using a Protein-Inhibitor Flexible Docking Approach. Antivir. Ther. 10, 157–166. doi:10.1097/01.aids.0000162343.96674.4c

PubMed Abstract | CrossRef Full Text | Google Scholar

Jenwitheesuk, E., and Samudrala, R. (2005b). Virtual Screening of Hiv-1 Protease Inhibitors against Human Cytomegalovirus Protease Using Docking and Molecular Dynamics. Aids 19, 529–531. doi:10.1097/01.aids.0000162343.96674.4c

PubMed Abstract | CrossRef Full Text | Google Scholar

Jenwitheesuk, E., and Samudrala, R. (2007). Identification of Potential Hiv-1 Targets of Minocycline. Bioinformatics 23, 2797–2799. doi:10.1093/bioinformatics/btm424

PubMed Abstract | CrossRef Full Text | Google Scholar

Jenwitheesuk, E., Wang, K., Mittler, J. E., and Samudrala, R. (2004). Improved Accuracy of Hiv-1 Genotypic Susceptibility Interpretation Using a Consensus Approach. Aids 18, 1858–1859. doi:10.1097/00002030-200409030-00020

PubMed Abstract | CrossRef Full Text | Google Scholar

Jenwitheesuk, E., Wang, K., Mittler, J. E., and Samudrala, R. (2005). Pirspred: a Web Server for Reliable Hiv-1 Protein-Inhibitor Resistance/susceptibility Prediction. Trends Microbiol. 13, 150–151. doi:10.1016/j.tim.2005.02.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Jenwitheesuk, E., Horst, J. A., Rivas, K. L., Van Voorhis, W. C., and Samudrala, R. (2008). Novel Paradigms for Drug Discovery: Computational Multitarget Screening. Trends Pharmacol. Sci. 29, 62–71. doi:10.1016/j.tips.2007.11.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Kantor, R., Machekano, R., Gonzales, M. J., Dupnik, K., Schapiro, J. M., and Shafer, R. W. (2001). Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an Expanded Data Model Integrating Natural Language Text and Sequence Analysis Programs. Nucleic Acids Res. 29, 296–299. doi:10.1093/nar/29.1.296

PubMed Abstract | CrossRef Full Text | Google Scholar

Koes, D. R., Baumgartner, M. P., and Camacho, C. J. (2013). Lessons Learned in Empirical Scoring with Smina from the Csar 2011 Benchmarking Exercise. J. Chem. Inf. Model. 53, 1893–1904. doi:10.1021/ci300604z

CrossRef Full Text | Google Scholar

Larocque, E., Naganna, N., Ma, X., Opoku-Temeng, C., Carter-Cooper, B., Chopra, G., et al. (2017). Aminoisoquinoline Benzamides, Flt3 and Src-Family Kinase Inhibitors, Potently Inhibit Proliferation of Acute Myeloid Leukemia Cell Lines. Future Med. Chem. 9, 1213–1225. doi:10.4155/fmc-2017-0067

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, X., Zhou, J., Wang, C., Carter-Cooper, B., Yang, F., Larocque, E., et al. (2017). Identification of New Flt3 Inhibitors that Potently Inhibit Aml Cell Lines via an Azo Click-It/staple-It Approach. ACS Med. Chem. Lett. 8, 492–497. doi:10.1021/acsmedchemlett.6b00468

PubMed Abstract | CrossRef Full Text | Google Scholar

Majumder, J., Lantz, T., Fine, J., and Chopra, G. (2017). Abstract A35: Drug Repurposing for Castration Resistant Prostate Cancer Based on Disease-Disease Relationships. Cancer Res.. doi:10.1158/1538-7445.Transcontrol16-A35

CrossRef Full Text | Google Scholar

Majumder, J., Fine, J. A., Lantz, T. C., Conder, C. J., and Chopra, G. (2018). Abstract Lb-076: Cancer Cell Specific Lethality by Degrading Specific Protein Target Network Identified Using a Chemical Screening Based Machine Learning MethodCancer Res.. doi:10.1158/1538-7445.AM2018-LB-076

CrossRef Full Text | Google Scholar

Mangione, W., Falls, Z., Melendy, T., Chopra, G., and Samudrala, R. (2020). Shotgun Drug Repurposing Biotechnology to Tackle Epidemics and Pandemics. Drug Discov. Today 25, 1126. doi:10.1016/j.drudis.2020.05.002

PubMed Abstract | CrossRef Full Text | Google Scholar

Morris, G. M., Huey, R., Lindstrom, W., Sanner, M. F., Belew, R. K., Goodsell, D. S., et al. (2009). Autodock4 and Autodocktools4: Automated Docking with Selective Receptor Flexibility. J. Comput. Chem. 30, 2785–2791. doi:10.1002/jcc.21256

CrossRef Full Text | Google Scholar

Mysinger, M. M., Carchia, M., Irwin, J. J., and Shoichet, B. K. (2012). Directory of Useful Decoys, Enhanced (Dud-e): Better Ligands and Decoys for Better Benchmarking. J. Med. Chem. 55, 6582–6594. doi:10.1021/jm300687e

CrossRef Full Text | Google Scholar

O'Boyle, N. M., Morley, C., and Hutchison, G. R. (2008). Pybel: a Python Wrapper for the Openbabel Cheminformatics Toolkit. Chem. Cent. J. 2, 1–7. doi:10.1186/1752-153X-2-5

PubMed Abstract | CrossRef Full Text | Google Scholar

O'Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., and Hutchison, G. R. (2011). Open Babel: An Open Chemical Toolbox. J. Cheminf. 3, 33. doi:10.1186/1758-2946-3-33

CrossRef Full Text | Google Scholar

Ohtaka, H., and Freire, E. (2005). Adaptive Inhibitors of the Hiv-1 Protease. Prog. Biophys. Mol. Biol. 88, 193–208. doi:10.1016/j.pbiomolbio.2004.07.005

PubMed Abstract | CrossRef Full Text | Google Scholar

Pandit, D., So, S.-S., and Sun, H. (2006). Enhancing Specificity and Sensitivity of Pharmacophore-Based Virtual Screening by Incorporating Chemical and Shape Features - a Case Study of Hiv Protease Inhibitors. J. Chem. Inf. Model. 46, 1236–1244. doi:10.1021/ci050511a

CrossRef Full Text | Google Scholar

Quinn, T. C. (2008). Hiv Epidemiology and the Effects of Antiviral Therapy on Long-Term Consequences. AIDS (London, England) 22, S7. doi:10.1097/01.aids.0000327510.68503.e8

PubMed Abstract | CrossRef Full Text | Google Scholar

Razzaghi-Asl, N., Sepehri, S., Ebadi, A., Miri, R., and Shahabipour, S. (2015). Effect of Biomolecular Conformation on Docking Simulation: a Case Study on a Potent Hiv-1 Protease Inhibitor. Iran J. Pharm. Res. 14, 785–802.

PubMed Abstract | Google Scholar

Rick, S. W., Topol, I. A., Burt, S. K., and Erickson, J. W. (1998). Molecular Mechanisms of Resistance: Free Energy Calculations of Mutation Effects on Inhibitor Binding to Hiv-1 Protease. Protein Sci. 7, 1750–1756. doi:10.1002/pro.5560070809

PubMed Abstract | CrossRef Full Text | Google Scholar

Robertson, M. A., Padgett, L. R., Fine, J. A., Chopra, G., and Mastracci, T. L. (2020). Targeting Polyamine Biosynthesis to Stimulate Beta Cell Regeneration in Zebrafish. Islets 12, 99–107. doi:10.1080/19382014.2020.1791530

PubMed Abstract | CrossRef Full Text | Google Scholar

Shafer, R. W., Stevenson, D., and Chan, B. (1999). Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database. Nucleic Acids Res. 27, 348–352. doi:10.1093/nar/27.1.348

PubMed Abstract | CrossRef Full Text | Google Scholar

Smith, R. D., Clark, J. J., Ahmed, A., Orban, Z. J., Dunbar, J. B., and Carlson, H. A. (2019). Updates to Binding Moad (Mother of All Databases): Polypharmacology Tools and Their Utility in Drug Repurposing. J. Mol. Biol. 431, 2423–2433. doi:10.1016/j.jmb.2019.05.024

PubMed Abstract | CrossRef Full Text | Google Scholar

Su, M., Yang, Q., Du, Y., Feng, G., Liu, Z., Li, Y., et al. (2018). Comparative Assessment of Scoring Functions: The Casf-2016 Update. J. Chem. Inf. Model. 59, 895–913. doi:10.1021/acs.jcim.8b00545

PubMed Abstract | CrossRef Full Text | Google Scholar

Trott, O., and Olson, A. J. (2010). Autodock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading. J. Comput. Chem. 31, 455–461. doi:10.1002/jcc.21334

CrossRef Full Text | Google Scholar

Wang, W., and Kollman, P. A. (2001). Computational Study of Protein Specificity: the Molecular Basis of Hiv-1 Protease Drug Resistance. Proc. Natl. Acad. Sci. 98, 14937–14942. doi:10.1073/pnas.251265598

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, R., Lai, L., and Wang, S. (2002). Further Development and Validation of Empirical Scoring Functions for Structure-Based Binding Affinity Prediction. J. Comput.-Aided Mol. Des. 16, 11–26. doi:10.1023/a:1016357811882

CrossRef Full Text | Google Scholar

Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004a). Development and Testing of a General Amber Force Field. J. Comput. Chem. 25, 1157–1174. doi:10.1002/jcc.20035

CrossRef Full Text | Google Scholar

Wang, K., Jenwitheesuk, E., Samudrala, R., and Mittler, J. E. (2004b). Simple Linear Model Provides Highly Accurate Genotypic Predictions of Hiv-1 Drug Resistance. Antivir. Ther. 9, 343–352.

PubMed Abstract | Google Scholar

Wang, J., Wang, W., Kollman, P. A., and Case, D. A. (2006). Automatic Atom Type and Bond Type Perception in Molecular Mechanical Calculations. J. Mol. Graphics Model. 25, 247–260. doi:10.1016/j.jmgm.2005.12.005

CrossRef Full Text | Google Scholar

Wei, Y., Li, J., Chen, Z., Wang, F., Huang, W., Hong, Z., et al. (2015). Multistage Virtual Screening and Identification of Novel Hiv-1 Protease Inhibitors by Integrating Svm, Shape, Pharmacophore and Docking Methods. Eur. J. Med. Chem. 101, 409–418. doi:10.1016/j.ejmech.2015.06.054

CrossRef Full Text | Google Scholar

Wijewardhane, P. R., Jethava, K. P., Fine, J. A., and Chopra, G. (2020). Combined Molecular Graph Neural Network and Structural Docking Selects Potent Programmable Cell Death Protein 1/programmable Death-Ligand 1 (Pd-1/pd-l1) Small Molecule Inhibitors. ChemRxiv. doi:10.26434/chemrxiv.12083907.v1

CrossRef Full Text | Google Scholar

Wlodawer, A., and Vondrasek, J. (1998). Inhibitors of Hiv-1 Protease: a Major success of Structure-Assisted Drug Design. Annu. Rev. Biophys. Biomol. Struct. 27, 249–284. doi:10.1146/annurev.biophys.27.1.249

PubMed Abstract | CrossRef Full Text | Google Scholar

Wlodawer, A., Miller, M., Jaskolski, M., Sathyanarayana, B. K., Baldwin, E., Weber, I. T., et al. (1989). Conserved Folding in Retroviral Proteases: crystal Structure of a Synthetic Hiv-1 Protease. Science 245, 616–621. doi:10.1126/science.2548279

PubMed Abstract | CrossRef Full Text | Google Scholar

Yadav, D., Paliwal, S., Yadav, R., Pal, M., and Pandey, A. (2012). Identification of Novel Hiv 1-protease Inhibitors: Application of Ligand and Structure Based Pharmacophore Mapping and Virtual Screening. PloS one 7, e48942. doi:10.1371/journal.pone.0048942

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: molecular docking, inhibitor prediction, protein–ligand interaction, HIV-1 protease, knowledge-based force field, CANDOCK

Citation: Falls Z, Fine J, Chopra G and Samudrala R (2022) Accurate Prediction of Inhibitor Binding to HIV-1 Protease Using CANDOCK. Front. Chem. 9:775513. doi: 10.3389/fchem.2021.775513

Received: 14 September 2021; Accepted: 25 November 2021;
Published: 17 January 2022.

Edited by:

Renjith Thomas, St Berchmans College, Mahatma Gandhi University, India

Reviewed by:

Kenny Lischer, University of Indonesia, Indonesia
Sixue Zhang, Southern Research Institute, United States

Copyright © 2022 Falls, Fine, Chopra and Samudrala. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Zackary Falls, em1mYWxsc0BidWZmYWxvLmVkdQ==; Gaurav Chopra, Z2Nob3ByYUBwdXJkdWUuZWR1; Ram Samudrala, cmFtQGNvbXBiaW8ub3Jn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.