A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution

Tufféry, Pierre; Derreumaux, Philippe

doi:10.3389/fbinf.2023.1113928

ORIGINAL RESEARCH article

Front. Bioinform., 16 January 2023

Sec. Protein Bioinformatics

Volume 3 - 2023 | https://doi.org/10.3389/fbinf.2023.1113928

This article is part of the Research TopicRecent Advances in Peptide Informatics: Challenges and OpportunitiesView all 8 articles

A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution

Pierre Tufféry¹*

Philippe Derreumaux^2,3

¹Université Paris Cité, CNRS UMR 8251, INSERM U1133, Paris, France
²Université Paris Cité, CNRS UPR9080, Laboratoire de Biochimie Théorique, Institut de Biologie Physico-Chimique, Fondation Edmond de Rothschild, Paris, France
³Institut Universitaire de France (IUF), Paris, France

Introduction: Peptides carry out diverse biological functions and the knowledge of the conformational ensemble of polypeptides in various experimental conditions is important for biological applications. All fast dedicated softwares perform well in aqueous solution at neutral pH.

Methods: In this study, we go one step beyond by combining the Debye-Hückel formalism for charged-charged amino acid interactions and a coarse-grained potential of the amino acids to treat pH and salt variations.

Results: Using the PEP-FOLD framework, we show that our approach performs as well as the machine-leaning AlphaFold2 and TrRosetta methods for 15 well-structured sequences, but shows significant improvement in structure prediction of six poly-charged amino acids and two sequences that have no homologous in the Protein Data Bank, expanding the range of possibilities for the understanding of peptide biological roles and the design of candidate therapeutic peptides.

1 Introduction

Peptides of less than 40 amino acids have diverse biological functions, acting as signaling entities in all domains of life, and targeting receptors or interfering with molecular interactions. Hormones and their bacterial mimetics (Fetissov et al., 2019), neuropeptides and their roles in neurodegenerative diseases (Ben-Shushan and Miller, 2021), antimicrobial peptides contribution to host defence (Mookherjee et al., 2020), and immunomodulatory peptides in the perspective of vaccine design (Pavlicevic et al., 2022) are some current directions motivating their study at a fundamental level. Due to their specific features, peptides have also gained interest as therapeutical agents (Muttenthaler et al., 2021), particularly to target protein-protein interactions (Cabri et al., 2021). They are also considered as having interest in the development of new functional biomimetic materials (Levin et al., 2020). Peptides have limitations though, as they can be highly flexible (Apostolopoulos et al., 2021), which motivate efforts to understand and predict their conformational energy landscapes.

Structure prediction of polypeptides with amino acid lengths up to 40 amino acids in aqueous solution can be performed by a series of methods including machine-learning approaches such as AlphaFold2 (Jumper et al., 2021), TrRosetta (Du et al., 2021), and APPTEST (Timmons and Hewage, 2021). Looking at AlphaFold2, which revolutionized structure prediction of single folded domain to a root-mean-square deviation (RMSD) accuracy of 0.2 nm, its capability lies on machine learning based on protein data bank (PDB) (Rose et al., 2012) templates, multiple sequence alignments, co-evolution rules and sophisticated algorithms to predict local backbone and side conformations, and side chain - side chain contact probability within distances bins. AlphaFold2 builds the protein by energy minimization using a protein-specific energy potential.

TrRosetta is basically similar to AlphaFold2. It builds the protein structure based on direct energy minimizations with a restrained Rosetta. The restraints include inter-residue distance and orientation distributions predicted by a deep neural network. Homologous templates are included in the network prediction to improve the accuracy further.

APPTEST also uses machine learning on the PDB structures with a chain length varying between 5 and 40 amino acids. APPTEST derives Cα-Cα and Cβ-Cβ distance restraints, and backbone dihedral restraints that are input of simulated annealing and energy minimization.

Other methods which are accessible by WEB-servers or can be downloaded include Rosetta (Bonneau et al., 2001), I-TASSER (Zhang, 2008), PepStrMod (Singh et al., 2015) and PEPFOLD (Shen et al., 2014; Lamiable et al., 2016). Rosetta is a fragment-assembly approach based on Monte Carlo simulation, a library of predicted nine and then three residues, and a coarse-grained model, followed by all-atom refinement. I-TASSER is a hierarchical approach that identifies structural templates from the PDB by multiple threading approaches, with full-length atomic models constructed by iterative template-based fragment assembly simulations.

The PEPstrMOD server predicts the tertiary structure of small peptides with sequence length varying between 7 and 25 residues. The prediction strategy is based on the realization that β-turn is an important feature of small peptides. Thus, the method uses both the regular secondary structure information predicted from PSIPRED and the β-turns information predicted from BetaTurns. The structure is further refined with energy minimization and molecular dynamic simulations.

PEP-FOLD2 is a fast accurate structure peptide approach based on the prediction of a profile of the structural alphabet of four amino acid lengths along the sequence, and a chain growth method based on the coarse-grained sOPEP2 model followed by Monte Carlo steps. It should be noted that PEP-FOLD2 is not free of learning as it uses an Support Vector Machine predictor relying on multiple sequence alignment. Of practical interest, during the time of this study, we could not access the APPTEST and PepStrMod servers. Also, TrRosetta cannot be applied to sequences with $<$ 10 amino acids.

Overall, all these methods generate good models for well-structured peptides at pH 7 in aqueous solution because most structures deposited in the PDB from nuclear magnetic resonance (NMR) and X-ray diffraction experiments were determined at neutral pH, and the PDB contains close to 200,000 structures as of 30 October 2022.

These methods face, however, two current limitations: correct conformational ensemble sampling of intrinsically disordered peptides or proteins (IDPs) which lack stable secondary and tertiary structures, and accurate conformational ensemble prediction of peptides as a function of pH and salt conditions. The first issue has motivated the development of new force fields, such as CHARMM36m-TIP3P modified (Huang et al., 2017), AMBER99SB-DISP (Robustelli et al., 2018) and many others (Nguyen and Derreumaux, 2020). The current approach to address the impact of pH variations is to perform your own extensive molecular dynamics and replica exchange molecular dynamics simulations at your desired pH. Alternatively one can use pH-replica exchange molecular dynamics using a discrete protonation method (Sabri Dashti et al., 2012) or all-atom and coarse-grained continuous constant pH molecular dynamics (CpHMD) methods (Barroso da Silva et al., 2016; Huang et al., 2016; Aho et al., 2022). Accurate and fast peptide structure predictions at different pH and salt conditions are the objectives of the present study.

The organization of this paper is as follows. In section 2, we present an extension of the coarse-grained, sOPEP2, force field to integrate Debye-Hückel charge interactions as a function of pH and salt concentrations. Next, we present the TrRosetta, AlphaFold2 and PEP-FOLD with and without Debye-Hückel protocols and the analysis methods. In section 3, we present the results of structure predictions of six poly-charged peptides as a function of pH and compare them to experimental circular dichroism (CD) data, and the predicted models obtained by TrRosetta and AlphaFold2. The charged polypeptides are particularly interesting to assemble the sOPEP2 interactions and the Debye-Hückel charge interactions. This is followed by the prediction of 15 ordered peptides, which have NMR structures resolved at a pH varying from 2 to 8. We finish this section on the prediction of four peptides for which low-resolution experimental data and topological descriptions are available. Finally section 4 summarizes our findings.

2 Methods

2.1 The sOPEP2 force field

The sOPEP2 potential, to be used in a discrete space, originates from the OPEP potential which uses an explicit representation of the backbone (N, H, Cα, N and H atoms) and one bead for each side chain, whose position to Cα and Van der Walls radius depend on the amino acid type (Maupetit et al., 2007; Sterpone et al., 2014). The sOPEP2 is expressed as a sum of local, non-bonded and hydrogen-bond (H-bond) terms, with all parameters described in (Binette et al., 2022).

E = E_{local} + E_{nonbonded} + E_{H - b o n d} (1)

Since the geometry in PEP-FOLD is mainly imposed by the superimposition of the discrete structural alphabet (SA) letters, the local contributions are restricted to a simple flat-bottomed quadratic potential to described the energy associated with dihedral angles ϕ and ψ, described by:

E (ϕ_{i}) = ϵ_{ϕ} {(ϕ_{i} - ϕ_{0_s c_i})}^{2} (2)

where ϕ_{0_sc_i} = ϕ within the interval $[ϕ_{l o w_s c_i}, ϕ_{h i g h_s c_i}]$ and ϕ_{0_sc_i} = min(ϕ − ϕ_{low_sc_i}, ϕ − ϕ_{high_sc_i}) outside of the interval ϕ_{low_sc_i} and ϕ_{high_sc_i} are specific to each amino acid type (Binette et al., 2022).

Non-bonded interactions corresponding to repulsion/attraction effects are described using the Mie potential (Mie, 1903) given by:

E_{v d w_i j} = ϵ_{i j} \times [\frac{m}{n - m} {(\frac{r_{i j}^{0}}{r_{i j}})}^{n} - \frac{n}{n - m} {(\frac{r_{i j}^{0}}{r_{i j}})}^{m}] (3)

where ϵ_ij is the potential depth and $r_{i j}^{0}$ is the position of the potential minimum function of atomic types for i and j. The combination of exponents, n and m, gives the relationship between the position of the potential minimum (r⁰) and the position where it is zero (gR0):

g R 0 = {(\frac{m}{n})}^{\frac{1}{n - m}} r_{0} (4)

Hydrogen bonds are considered explicitly, using a combination of two types of contributions:

E_{H - b o n d} = E_{HBpairwise} + E_{H B c o o p} (5)

where $E_{H B_{pairwise}}$ corresponds to the two-body contributions of all hydrogen bonds between residue (i) and residue (j), characterized by the hydrogen/acceptor distance r_ij and the donor/hydrogen/acceptor angle α_ij:

E_{HBpairwise} (r_{i j}, α_{i j}) = ϵ_{α}^{H B} \sum_{i j, j = i + 4} μ (r_{i j}) \cdot ν (α_{i j}) + ϵ_{β}^{H B} \sum_{i j, j > 4} μ (r_{i j}) \cdot ν (α_{i j}) (6)

μ (r_{i j}) = ϵ_{i j} \cdot [5 {(\frac{σ}{r_{i j}})}^{12} - 6 {(\frac{σ}{r_{i j}})}^{10}] (7)

ν (α_{i j}) = \{\begin{cases} \cos (α_{i j}) & if α_{i j} > 90 ° \\ 0 & otherwise \end{cases} (8)

where σ = 0.18 nm is the position of the potential minimum and ϵ is the potential depth. We distinguish between α-helix-like hydrogen bonds defined by O(i)-H(i+4) and other hydrogen bonds. Hydrogen bonds between a pair of residues separated by less than four amino acids are not considered.

$E_{H B_{coop}}$ involves four-body interactions involving pairs of hydrogen bonds (between residues (i) and (j) and residues (k) and (l)), so as to stabilize secondary structure motifs. The cooperation energy is given by the following equations:

E_{HBcoop} (r_{i j}, r_{k l}) = ϵ_{α}^{coop} \sum C (r_{i j}, r_{k l}) \times Δ (i j k l) + ϵ_{β}^{coop} \sum C (r_{i j}, r_{k l}) \times Δ^{'} (i j k l) (9)

C (r_{i j}, r_{k l}) = \exp (- 0.5 {(r_{i j} - σ)}^{2}) \cdot \exp (- 0.5 {(r_{k l} - σ)}^{2}) (10)

Δ (i j k l) = \{\begin{cases} 1 & if (k, l) = (i + 1, j + 1) \\ 0 & otherwise \end{cases} (11)

Δ^{'} (i j k l) = \{\begin{cases} 1 & if (k, l) = (i + 2, j - 2) \\ or (k, l) = (i + 2, j + 2) \\ 0 & otherwise \end{cases} (12)

2.2 Debye-Hückel charge interactions

The new sOPEP version introduces the possibility to consider pH-dependent charge interactions, using the Debye-Hückel formulation (Debye and Hückel, 1923).

E_{D H_{i j}} = (q_{i} * q_{j} * e^{- r_{i j} / l_{D H}}) / (ϵ (r_{i j}) * r_{i j}) (13)

where q_i and q_j correspond to the charge of particles i and j, j > i + 1, respectively, r_ij is the distance between the particles, l_DH is the Debye length that depends of the ionic strength of the solvent, and ϵ(r_ij) is the dielectric constant that depends on the distance between the charges:

ϵ (r) = D_{w} - (D_{w} - D_{p}) (s^{2} r^{2} + D_{p} s r + D_{p}) e^{- s r} / D_{p} (14)

where D_w is the dielectric constant of water, D_p is the dielectric constant inside a protein, and s is the slope of the sigmoidal function. In practice, we used values of 78, 2 and 0.6 for D_w, D_p and s, respectively, as stated in (Iwaoka et al., 2020).

Since the sOPEP representation does not include all-atom side chains, but charges associated with particles of heterogeneous sizes, it is necessary to shift the energy curve to have energy values compatible with those of the Mie formulation. For each pair of particle, we shifted the distance using:

$r_{i j}^{S H} = r_{i j} + s h i f t_{i j}$ and we evaluated $E_{D H_{i j}}$ using $r_{i j}^{S H}$ except for ϵ(r), where the unshifted distance is used.

Shift values were adjusted for r such as E_{vdw_r} = k, $E_{D H_{r}} = E_{v d w_r}$ , as illustrated in Figure 1. In practice, we found that values of k on the order of 4 kcal/mol are convenient, and the Debye-Hückel energy was truncated to $E_{D H_{r}}$ to avoid redundancy with the Mie potential. Also, as sOPEP2 side-chain side-chain potential already includes some of the interactions between the charged residues, the Mie potential is restricted to only the repulsive part for charged side chains.

FIGURE 1

FIGURE 1. Fitting Debye-Hückel (DH) energy to Mie potential. The shift of the unshifted DH potential (red) is set so that the Mie (black) and shifted DH potential (green) cross for some energy threshold (4 kcal/mol in this case).

Charges were assigned to particles depending on the pH using pKa values of 3.9, 4.2, 6.0, 10.5 and 12.5 for ASP, GLU, HIS, LYS and ARG side chains, respectively, and 9.0 and 2.0 for the N-terminal α-ammonium and C-terminal α-carboxyl groups, respectively. Note that it is possible to consider blocking the extremities using acetyl and N-methyl on the N-terminus and C-terminus groups, respectively, in which case no charge is assigned to the extremity.

Finally, we have considered weighting differently the electrostatic contributions depending on the separation of the amino acids in the sequence. In our experience, best results were obtained using a weight of 10 for residue separation of less than 7 amino acids and a weight of 2, otherwise.

2.3 PEP-FOLD, TrRosetta and AlphaFold2 protocols and analysis

Our validation test set includes a total of 25 peptides as described in Section 3. For each peptide, we performed two PEP-FOLD simulations, one TrRosetta simulation which uses PDB templates and homologous sequences, and one AlphaFold2 simulation in its standard version using three recycles, template information, and AMBER refinement. Both TrRosetta and AlphaFold2 simulations return five models that we considered equiprobable. The PEP-FOLD simulations without Debye-Hückel (referred to as PF-noDH), and with Debye-Hückel (PF-DH) generated 200 models each. We selected the representative models of the five best clusters identified among the 200 generated models based on their rankings using sOPEP2 energies - i.e. the standard PEP-FOLD model selection - for PF-noDH and the sum of sOPEP2 and Debye-Hückel energies for PF-DH.

We have considered 15 peptides for which a PDB structure is available. These correspond to peptides previously studied during PEP-FOLD development and new peptides with their structures released after 1 September 2019, and determined in pure aqueous environment. The predicted models of the 15 peptides were evaluated by computing the CAD-score (Olechnovič et al., 2013). The reported CAD-score corresponds to the largest value of the cross CAD-scores between the five predicted models and all NMR structures. Following our previous work, if the CAD-score calculated on the backbone atoms is $>$ 0.60, the model is associated with largely correct secondary structure prediction, otherwise if it is $>$ 0.65 the model is correct in terms of secondary and tertiary structures. For the poly-charged peptides, we also computed their secondary structure contents using STRIDE program (Frishman and Argos, 1995). For the four sequences free of any NMR structure, we compared their predicted and experimental topologies.

3 Results and discussion

3.1 Predicted models of poly-charged peptides

For the simulations of the six poly-charged peptides, namely (EK)15, (EK)5, (H)30, (E)15, (K)15 and (R)25, we calculated the alpha-helix, coil and turn contents averaged over the five models of each method and compared with circular dichroism (CD) experiments. It is to be noted that by default TrRosetta, AlphaFold2 and PF-noDH only perform simulations at neutral pH. CD values are not available at all pH varying from 3 to 13. We report, however, on the pH-dependent conformations using PF-DH. Results are summarized in Table 1.

TABLE 1

TABLE 1. Structural impact of pH variation on poly-charged peptides.

TrRosetta, AlphaFold2 and PF-noDH have a very high propensity to report alpha-helical conformations for the six polypeptides at pH 7.4, the exception being (H)30, with alpha-content varying from 54% to 97%, while CD displays only coil or beta-turn signals. For instance, for (EK)5, TrRosetta reports 68% helix and 26% coil, AlphaFold2 reports 54% helix and 10% coil and PF-noDH reports 90% helix and 10% coil. Only PF-DH is able to predict the CD coil character of (EK)5 with 68% coil and 32% turn.

PF-DH is the single method to predict 85% coil and 15% turn consistent at pH 7.4 with the beta-turn CD signal of (EK)15 (Smith et al., 2020), and PF-DH predicts 100% coil at pH 5 consistent with the coil CD signal of (H)30 (Jesus, 2020). There is strong experimental evidence that (H)30 polymerizes at pH 7.4 forming beta-sheets. At this pH, PF-noDH and TrRosetta predict strong helical conformations, while PF-DH and AlphaFold2 predict a random coil, with contents of 95% and 81%, respectively.

The polypeptides (K)15 and (E)15 are particularly interesting because the alpha-helix content changes inversely with the pH. As observed by CD, the helical content of (K)15 increases with pH, while the helical content of (E)15 decreases with pH (Batys et al., 2020). (K)15 have 0% helix at pH 3.6 and 83.7% helix at pH 11–13 by CD. PF-DH finds 0% helix at pH 3.6 and 93% helix at pH 11–13 (Figure 2). In contrast to (K)15, (E)15 (Figure 3) have 42% helix at pH 3.6 and 0% helix at pH 11–13 by CD. PF-DH finds 93% helix at pH 3.6 and 100% coil at pH 11–13.

FIGURE 2

FIGURE 2. PF-DH Conformational ensemble of (K)15 as a function of pH. (A) pH 7.4 (B) pH 13. Only the lowest energy model (rank 1) is depicted.

FIGURE 3

FIGURE 3. Conformational ensemble of (E)15 at pH7.4. (A) PF-noDH, AlphaFold2 and TrRosetta (B) PF-DH. Only the lowest energy (rank 1) model is depicted.

The conformation ensemble of (R)25 is predicted to have 50% coil and 31% beta-sheet at pH 5.7 and have 51% coil and 21% beta-sheet at pH 11.3 by CD (Morga et al., 2022). PF-DH predicts 100% coil, independently of the pH values. Its performance is however much better than those of PF-noDH, AlphaFold2 and TrRosetta which predict a high helical signal varying from 77% to 96%.

Overall, the structure predictions of the six polypeptides at neutral pH (7.4) give quite different contents of the secondary structure using PF-noDH and PF-DH, with PF-noDH behaving and failing like AlphaFold2 and TrRosetta predictions. This result emphasizes the role of the Debye-Hückel charged-charged interactions when treating poly-charged peptides. The results also demonstrate that the learning stage of the local conformations in PEP-FOLD performed from structures at neutral pH can be counterbalanced by the force field, making possible to explore new conformations depending on the pH. In contrast, AlphaFold2 and TrRosetta rely on homologous structures and multiple sequence alignements. Since neither is available for poly-charged peptides, it is normal for both predictors to fail. But surprisingly, the LDDT (local distance difference test) metric predicted by both methods is, on average, very high ( $>$ 80%) for all amino acids of the six poly-charged peptides.

It is important to emphasize that in this study, we assume the standard pka values of charged amino acids irrespective of the amino acid composition of the peptides and the conformations of the peptides. This is a strong limitation of our current approach. Determining the pka values of charged amino acids in protein structures has motivated the development of many theoretical methods (Sabri Dashti et al., 2012; Barroso da Silva et al., 2016; Huang et al., 2016; Aho et al., 2022). To illustrate the variation of the pka values, we used the H++ server which is based on classical continuum electrostatics and basic statistical mechanisms (Anandakrishnan et al., 2012). Using (K)15, we found pka values ranging from 10.1 to 9.4 (versus 10.5 in our model); using (R)25, we found pka ranging from 9.6 to 11.6 in one conformation, and from 10.9 to 11.7 in another conformation (versus 12.5 in our model), and using (H)30, we found pka variations from 4.7 to 6.3 (versus 6.0 in our model). Clearly this change of pka of the amino acids will impact the equilibrium conformations of PF-DH.

3.2 Predicted models of polypeptides with NMR structures

The experimental information of each well-ordered peptide, given in Table 2, includes the amino acid length varying from 8 to 35 amino acids, the number of NMR models, the WDC (well-defined rigid core) according to the PDB, the topology, the pH varying from 4.3 to 7, the ionic strength varying from 0 to 150 mM NaCl, the blocking of the extremities and the amino acid sequence.

TABLE 2

TABLE 2. Peptide set.

Table 3 reports on the CAD-scores using the full sequences and the rigid cores of each of the 15 peptides using the four methods. Note we give the results of PF-noDH, AlphaFold2 and TrRosetta, because these methods which are pH independent are used irrespective of the experimental pH conditions.

TABLE 3

TABLE 3. Performance prediction for structured peptides.

The first striking result is that (the mean, standard deviation and median) values of the CAD-scores averaged over the 15 peptides are nearly identical for the four methods using both the full sequences or the rigid cores. They reach (0.73, 0.07, 0.75) for PF-noDH, (0.74, 0.10, 0.74) for AlphaFold2, (0.74, 0.06, 0.74) for PF-DH and (0.76, 0.08, 0.77) for TrRosetta using the full sequences. Similar trends are observed considering the well-defined rigid cores, the average CAD-scores being of 0.75, 0.76, 0.76 and 0.78 for PF-noDH, PF-DH, AlphaFold2 and TrRosetta, respectively.

The second result is that PF-noDH and PF-DH do not predict any low quality models (CAD-score $<$ 0.6), while AlphaFold2 produces CAD-scores of 0.59, 0.55 and 0.6 for the three targets 6nm3,1egs and 7li2 (Figures 4–6, respectively). It has to be noted that the structures of these three peptides were solved at pH 5.8, 6.5 and 7. This low score results in differences between the experimental and predicted topologies. Experimentally, 6nm3 adopts a helical-like conformation, 1egs adopts a beta2-like conformation and 7li2 adopts a beta-2 like conformation. Of note, a beta-2 like conformation has the topology of a beta-hairpin but lacks the H-bond network.

FIGURE 4

FIGURE 4. Conformational ensemble of 6 nm3. (A) PF-noDH, (B) PF-DH at pH 4.3, (C) AlphaFold2, (D) NMR structure. For A, B and C, the 5 predicted models are depicted. For D, all models provided in the PDB are depicted.

FIGURE 5

FIGURE 5. Conformational ensemble of 1egs. (A) PF-noDH, (B) PF-DH at pH 4.3, (C) AlphaFold2, (D) NMR structure. For A, B and C, the 5 predicted models are depicted. For D, all models provided in the PDB are depicted.

FIGURE 6

FIGURE 6. Conformational ensemble of 7li2. (A) PF-noDH, (B) PF-DH at pH 4.3, (C) AlphaFold2, (D) TrRosetta, (E) NMR structure. For A, B, C and D the 5 predicted models are depicted. For E, all models provided in the PDB are depicted.

For these three systems, AlphaFold2 predicts an extended-unstructured conformation. The 7li2 target is also problematic for TrRosetta, as it is the single system with a CAD-score $<$ 0.6, namely 0.58 leading to an extended-unstructured conformation. Inversely, TrRosetta is the best method to predict the beta-hairpin of 1pgbF (Barducci et al., 2011) with a CAD-score of 0.91 versus 0.83 with AlphaFold2 and 0.79 with PF-DH.

The third result is related to the performance of PF-DH with respect to PF-noDH, which provides evidence that the weights of the Debye-Huckel salt bridge interactions are consistent with the weights of sOPEP2 interactions. It was far from being evident that the addition of charges at extremities and charged amino acids in the core of the sequences would not change the quality of the models for pH varying between 2.9 and 7. The number of titratable amino acids varies from 1–2 (1le1, 1egs - 6nm3, 6evq), 4 for 1j4m, 5 for 6j9p and 1pgbF, 6 for 6mi9, 7 for 1wbr and 7li2, 9 for 6r2x, 10 for 6svc and 7b2f, to 12 for 1fsd. The results also show that the pH-independent PEP-FOLD version and the pH-dependent PEP-FOLD version perform similarly for peptides containing charged, hydrophilic and hydrophobic amino acids.

Finally, using NMR structures as a reference, a very recent study benchmarked the accuracy of AlphaFold2 in predicting 588 peptide structures between 10 and 40 amino acids, including soluble peptides, membrane-associated peptides, and disulfide-rich peptides (McDonald et al., 2023). Although the study ignores pH conditions and the presence of the membrane, AlphaFold2 can be used for the modeling of peptide structures anticipated to have a well-defined secondary structure. AlphaFold2 is particularly successful in the prediction of alpha-helical membrane-associated peptides and disulfide-rich peptides, but also shows some shortcomings in predicting phi and psi angles. It was found that AlphaFold2 performs at least as well if not better than TrRosetta and PEP-FOLD using our 2016 set of parameters.

3.3 Predicted models of polypeptides without any NMR structures

The last four peptides have been discussed in literature in terms of topological features without delivering any NMR structure. Their sequences are given at the bottom of Table 2.

Two peptides are rather well described by all four methods. Pep17 has been shown as a stable monomeric helix at pH two using CD and NMR experiments (Bradley et al., 1990). PF-noDH, PH-DH at pH 2 and AlphaFold2 predict a helical conformation with a frayed N-terminus, while TrRosetta predicts a full helical conformation (Figure 7). Pep38 determined experimentally as a helix-turn-helix at pH 3.6 (Fezoui et al., 1997) is also well reproduced by the four methods.

FIGURE 7

FIGURE 7. Conformational ensemble of pep17. (A) PF-noDH, (B) PF-DH—pH 2, (C) AlphaFold2 and (D) TrRosetta. For each method, the 5 predicted models are depicted.

There are two cases, where AlphaFold2 and TrRosetta fail to produce the experimental data. The first peptide is pep10 which is described experimentally by an ensemble of distinct transient beta-hairpins at pH 4.3(Alba et al., 1997). It is described as an unstructured turn-like conformation by TrRosetta (Figure 8D), and an ensemble of extended and beta2-like conformations by AlphaFold2 (Figure 8C). In contrast, PF-noDH and PF-DH predict well a beta-hairpin conformation (Figures 8A,B).

FIGURE 8

FIGURE 8. Conformational ensemble of pep10. (A) PF-noDH, (B) PF-DH at pH 4.3, (C) AlphaFold2, (D) TrRosetta. For each method, the 5 predicted models are depicted.

The second peptide is the tau fragment encompassing residues 295–306 containing the aggregation-prone PHF6 motif (306–311). Using cross-linking mass-spectrometry, ab initio Rosetta (Ovchinnikov et al., 2018), and CS-Rosetta which leveraged available chemical shifts (Lange et al., 2012) for the tau repeat spanning residues 243–365, the tau fragment 295–306 was predicted as a beta-hairpin at pH 7 (Chen et al., 2019). PF-noDH and PF-DH predict the same conformation (Figures 9A, B). In contrast, AlphaFold2 predicts extended conformations (Figure 9C), and surprisingly TrRosetta finds a random coil conformation (Figure 9D).

FIGURE 9

FIGURE 9. Conformation ensemble of tau-fragment at pH 7. (A) PF-noDH, (B) PF-DH, (C) AlphaFold2 and (D) TrRosetta. For each method, the 5 predicted models are depicted.

Overall, this small set of peptides provides evidence of some limitations of AlphaFold2 and TrRosetta when the target does not have an homologous sequence in the PDB.

4 Conclusion

Integrating pH variation effects to a coarse-grained model, where the side chains are represented by one single bead, is an important step toward accurate polypeptide structure prediction in aqueous solution, as coarse-graining with various granularities (de Vries and Zacharias, 2013; Sieradzan et al., 2022), enhance sampling. This task has been performed by combining a Debye-Hückel formalism for charged - charged side chain interactions and the sOPEP2 potential. By using a total of 25 peptides of amino acid lengths varying between 7 and 38 amino acids, this study provides evidence that PF-noDH, PF-DH, AlphaFold2 and TrRosetta perform similarly on peptides deposited in the Protein data Bank, but PF-DH outperforms the two recent machine-learning methods for poly-charged peptides, and peptides for which homologous sequences are not deposited in the PDB. Of note, our new formulation takes into account the impact of salt concentration variations, but we could not identify from the literature any case reporting a conformation change upon ionic strength variation.

Overall this work is one step towards peptide structure prediction in mimicking in vivo conditions. We are currently working on IDP’s in aqueous solution and de novo structure prediction of peptides at the surface of two-dimensional cell membranes.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

PT, and PD contributed to conception and design of the study. PT performed the PEP-FOLD implementation and ran the PEP-FOLD and Colabfold (AlphaFold2) simulations. PD ran the TrRosetta simulations. PT and PD performed the analyses. PT, and PD wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Funding

This work was supported by the “Initiative d’Excellence” program from the French State, Grant “DYNAMO,” ANR-11-LABX-0011, and by INSERM U1133 recurrent funding.

Acknowledgments

The authors thank Andrew Doig for helpful discussions about pKa.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Aho, N., Buslaev, P., Jansen, A., Bauer, P., Groenhof, G., and Hess, B. (2022). Scalable constant pH molecular dynamics in GROMACS. J. Chem. Theory Comput. 18, 6148–6160. doi:10.1021/acs.jctc.2c00516

PubMed Abstract | CrossRef Full Text | Google Scholar

Alba, E. D., Rico, M., and Jiménez, M. A. (1997). Cross-strand side-chain interactions versus turn conformation in β-hairpins. Protein Sci. 6, 2548–2560. doi:10.1002/pro.5560061207

PubMed Abstract | CrossRef Full Text | Google Scholar

Anandakrishnan, R., Aguilar, B., and Onufriev, A. V. (2012). H++ 3.0: automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations. Nucleic Acids Res. 40, W537–W541. doi:10.1093/nar/gks375

PubMed Abstract | CrossRef Full Text | Google Scholar

Apostolopoulos, V., Bojarska, J., Chai, T.-T., Elnagdy, S., Kaczmarek, K., Matsoukas, J., et al. (2021). A global review on short peptides: Frontiers and perspectives. Molecules 26, 430. doi:10.3390/molecules26020430

PubMed Abstract | CrossRef Full Text | Google Scholar

Barducci, A., Bonomi, M., and Derreumaux, P. (2011). Assessing the quality of the OPEP coarse-grained force field. J. Chem. Theory Comput. 7, 1928–1934. doi:10.1021/ct100646f

PubMed Abstract | CrossRef Full Text | Google Scholar

Barroso da Silva, F. L., Pasquali, S., Derreumaux, P., and Dias, L. G. (2016). Electrostatics analysis of the mutational and pH effects of the n-terminal domain self-association of the major ampullate spidroin. Soft Matter 12, 5600–5612. doi:10.1039/C6SM00860G

PubMed Abstract | CrossRef Full Text | Google Scholar

Batys, P., Morga, M., Bonarek, P., and Sammalkorpi, M. (2020). pH-induced changes in polypeptide conformation: Force-field comparison with experimental validation. J. Phys. Chem. B 124, 2961–2972. doi:10.1021/acs.jpcb.0c01475

PubMed Abstract | CrossRef Full Text | Google Scholar

Ben-Shushan, S., and Miller, Y. (2021). Neuropeptides: Roles and activities as metal chelators in neurodegenerative diseases. J. Phys. Chem. B 125, 2796–2811. doi:10.1021/acs.jpcb.0c11151

PubMed Abstract | CrossRef Full Text | Google Scholar

Binette, V., Mousseau, N., and Tuffery, P. (2022). A generalized attraction–repulsion potential and revisited fragment library improves PEP-FOLD peptide structure prediction. J. Chem. Theory Comput. 18, 2720–2736. doi:10.1021/acs.jctc.1c01293

PubMed Abstract | CrossRef Full Text | Google Scholar

Bonneau, R., Strauss, C. E., and Baker, D. (2001). Improving the performance of Rosetta using multiple sequence alignment information and global measures of hydrophobic core formation. Proteins Struct. Funct. Bioinforma. 43, 1–11. doi:10.1002/1097-0134(20010401)43:1<1::aid-prot1012>3.0.co;2-a

PubMed Abstract | CrossRef Full Text | Google Scholar

Bradley, E. K., Thomason, J. F., Cohen, F. E., Kosen, P. A., and Kuntz, I. D. (1990). Studies of synthetic helical peptides using circular dichroism and nuclear magnetic resonance. J. Mol. Biol. 215, 607–622. doi:10.1016/S0022-2836(05)80172-X

PubMed Abstract | CrossRef Full Text | Google Scholar

Cabri, W., Cantelmi, P., Corbisiero, D., Fantoni, T., Ferrazzano, L., Martelli, G., et al. (2021). Therapeutic peptides targeting PPI in clinical development: Overview, mechanism of action and perspectives. Front. Mol. Biosci. 8, 697586. doi:10.3389/fmolb.2021.697586

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, D., Drombosky, K. W., Hou, Z., Sari, L., Kashmer, O. M., Ryder, B. D., et al. (2019). Tau local structure shields an amyloid-forming motif and controls aggregation propensity. Nat. Commun. 10, 2493. doi:10.1038/s41467-019-10355-1

PubMed Abstract | CrossRef Full Text | Google Scholar

de Vries, S., and Zacharias, M. (2013). Flexible docking and refinement with a coarse-grained protein model using ATTRACT: Flexible Protein-Protein Docking and Refinement. Proteins Struct. Funct. Bioinforma. 81, 2167–2174. doi:10.1002/prot.24400

CrossRef Full Text | Google Scholar

Debye, P., and Hückel, E. (1923). Zur Theorie der Elektrolyte. I. Gefrierpunktserniedrigung und verwandte Erscheinungen. Phys. Z. 24, 305.

Google Scholar

Du, Z., Su, H., Wang, W., Ye, L., Wei, H., Peng, Z., et al. (2021). The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 16, 5634–5651. doi:10.1038/s41596-021-00628-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Fetissov, S. O., Legrand, R., and Lucas, N. (2019). Bacterial protein mimetic of peptide hormone as a new class of protein-based drugs. Curr. Med. Chem. 26, 546–553. doi:10.2174/0929867324666171005110620

PubMed Abstract | CrossRef Full Text | Google Scholar

Fezoui, Y., Connolly, P. J., and Osterhout, J. J. (1997). Solution structure of αtα, a helical hairpin peptide of de novo design. Protein Sci. 6, 1869–1877. doi:10.1002/pro.5560060907

PubMed Abstract | CrossRef Full Text | Google Scholar

Frishman, D., and Argos, P. (1995). Knowledge-based protein secondary structure assignment. Proteins Struct. Funct. Genet. 23, 566–579. doi:10.1002/prot.340230412

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Y., Chen, W., Wallace, J. A., and Shen, J. (2016). All-atom continuous constant pH molecular dynamics with particle mesh ewald and titratable water. J. Chem. Theory Comput. 12, 5411–5421. doi:10.1021/acs.jctc.6b00552

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, J., Rauscher, S., Nawrocki, G., Ran, T., Feig, M., De Groot, B. L., et al. (2017). CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. methods 14, 71–73. doi:10.1038/nmeth.4067

PubMed Abstract | CrossRef Full Text | Google Scholar

Iwaoka, M., Yoshida, K., and Shimosato, T. (2020). Application of a distance-dependent sigmoidal dielectric constant to the REMC/SAAP3D simulations of chignolin, trp-cage, and the G10q mutant. Protein J. 39, 402–410. doi:10.1007/s10930-020-09936-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Jesus, C. N. (2020). On the self-assembly of pH-sensitive histidine-based copolypeptides. London: University College London.

Google Scholar

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. doi:10.1038/s41586-021-03819-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Lamiable, A., Thévenet, P., Rey, J., Vavrusa, M., Derreumaux, P., and Tufféry, P. (2016). PEP-FOLD3: Faster de novo structure prediction for linear peptides in solution and in complex. Nucleic Acids Res. 44, W449–W454. doi:10.1093/nar/gkw329

PubMed Abstract | CrossRef Full Text | Google Scholar

Lange, O. F., Rossi, P., Sgourakis, N. G., Song, Y., Lee, H.-W., Aramini, J. M., et al. (2012). Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proc. Natl. Acad. Sci. 109, 10873–10878. doi:10.1073/pnas.1203013109

PubMed Abstract | CrossRef Full Text | Google Scholar

Levin, A., Hakala, T. A., Schnaider, L., Bernardes, G. J. L., Gazit, E., and Knowles, T. P. J. (2020). Biomimetic peptide self-assembly for functional materials. Nat. Rev. Chem. 4, 615–634. doi:10.1038/s41570-020-0215-y

CrossRef Full Text | Google Scholar

Maupetit, J., Tuffery, P., and Derreumaux, P. (2007). A coarse-grained protein force field for folding and structure prediction. Proteins Struct. Funct. Bioinforma. 69, 394–408. doi:10.1002/prot.21505

PubMed Abstract | CrossRef Full Text | Google Scholar

McDonald, E. F., Jones, T., Plate, L., Meiler, J., and Gulsevin, A. (2023). Benchmarking alphafold2 on peptide structure prediction. Structure 31, 111–119.e2. doi:10.1016/j.str.2022.11.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Mie, G. (1903). Zur kinetischen Theorie der einatomigen Körper. Ann. Phys. 316, 657–697. doi:10.1002/andp.19033160802

CrossRef Full Text | Google Scholar

Mookherjee, N., Anderson, M. A., Haagsman, H. P., and Davidson, D. J. (2020). Antimicrobial host defence peptides: Functions and clinical potential. Nat. Rev. Drug Discov. 19, 311–332. doi:10.1038/s41573-019-0058-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Morga, M., Batys, P., Kosior, D., Bonarek, P., and Adamczyk, Z. (2022). Poly-L-arginine molecule properties in simple electrolytes: Molecular dynamic modeling and experiments. Int. J. Environ. Res. Public Health 19, 3588. doi:10.3390/ijerph19063588

PubMed Abstract | CrossRef Full Text | Google Scholar

Muttenthaler, M., King, G. F., Adams, D. J., and Alewood, P. F. (2021). Trends in peptide drug discovery. Nat. Rev. Drug Discov. 20, 309–325. doi:10.1038/s41573-020-00135-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Nguyen, P. H., and Derreumaux, P. (2020). Structures of the intrinsically disordered Aβ, tau and α-synuclein proteins in aqueous solution from computer simulations. Biophys. Chem. 264, 106421. doi:10.1016/j.bpc.2020.106421

PubMed Abstract | CrossRef Full Text | Google Scholar

Olechnovič, K., Kulberkytė, E., and Venclovas, Č. (2013). CAD-score: A new contact area difference-based function for evaluation of protein structural models. Proteins Struct. Funct. Bioinforma. 81, 149–162. doi:10.1002/prot.24172

CrossRef Full Text | Google Scholar

Ovchinnikov, S., Park, H., Kim, D. E., DiMaio, F., and Baker, D. (2018). Protein structure prediction using Rosetta in CASP12. Proteins Struct. Funct. Bioinforma. 86, 113–121. doi:10.1002/prot.25390

PubMed Abstract | CrossRef Full Text | Google Scholar

Pavlicevic, M., Marmiroli, N., and Maestri, E. (2022). Immunomodulatory peptides—a promising source for novel functional food production and drug discovery. Peptides 148, 170696. doi:10.1016/j.peptides.2021.170696

PubMed Abstract | CrossRef Full Text | Google Scholar

Robustelli, P., Piana, S., and Shaw, D. E. (2018). Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl. Acad. Sci. 115, E4758–E4766. doi:10.1073/pnas.1800690115

PubMed Abstract | CrossRef Full Text | Google Scholar

Rose, P. W., Bi, C., Bluhm, W. F., Christie, C. H., Dimitropoulos, D., Dutta, S., et al. (2012). The RCSB protein Data Bank: New resources for research and education. Nucleic acids Res. 41, D475–D482. doi:10.1093/nar/gks1200

PubMed Abstract | CrossRef Full Text | Google Scholar

Sabri Dashti, D., Meng, Y., and Roitberg, A. E. (2012). pH-replica exchange molecular dynamics in proteins using a discrete protonation method. J. Phys. Chem. B 116, 8805–8811. doi:10.1021/jp303385x

PubMed Abstract | CrossRef Full Text | Google Scholar

Shen, Y., Maupetit, J., Derreumaux, P., and Tufféry, P. (2014). Improved PEP-FOLD approach for peptide and miniprotein structure prediction. J. Chem. theory Comput. 10, 4745–4758. doi:10.1021/ct500592m

PubMed Abstract | CrossRef Full Text | Google Scholar

Sieradzan, A. K., Czaplewski, C., Krupa, P., Mozolewska, M. A., Karczyńska, A. S., Lipska, A. G., et al. (2022). “Modeling the structure, dynamics, and transformations of proteins with the UNRES force field,” in Protein folding. Methods in molecular biology. Editor V. Muñoz (New York, NY: Springer US), 399–416. doi:10.1007/978-1-0716-1716-8_23

CrossRef Full Text | Google Scholar

Singh, S., Singh, H., Tuknait, A., Chaudhary, K., Singh, B., Kumaran, S., et al. (2015). PEPstrMOD: structure prediction of peptides containing natural, non-natural and modified residues. Biol. Direct 10, 1–19. doi:10.1186/s13062-015-0103-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Smith, J., McMullen, P., Yuan, Z., Pfaendtner, J., and Jiang, S. (2020). Elucidating molecular design principles for charge-alternating peptides. Biomacromolecules 21, 435–443. doi:10.1021/acs.biomac.9b01191

PubMed Abstract | CrossRef Full Text | Google Scholar

Sterpone, F., Melchionna, S., Tuffery, P., Pasquali, S., Mousseau, N., Cragnolini, T., et al. (2014). The OPEP protein model: From single molecules, amyloid formation, crowding and hydrodynamics to DNA/RNA systems. Chem. Soc. Rev. 43, 4871–4893. doi:10.1039/C4CS00048J

PubMed Abstract | CrossRef Full Text | Google Scholar

Timmons, P. B., and Hewage, C. M. (2021). APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures. Briefings Bioinforma. 22, bbab308. doi:10.1093/bib/bbab308

CrossRef Full Text | Google Scholar

Zhang, Y. (2008). I-TASSER server for protein 3D structure prediction. BMC Bioinform. 9, 40. doi:10.1186/1471-2105-9-40

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: peptide, structure, pH dependence, coarse grained models, prediction

Citation: Tufféry P and Derreumaux P (2023) A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution. Front. Bioinform. 3:1113928. doi: 10.3389/fbinf.2023.1113928

Received: 01 December 2022; Accepted: 06 January 2023;
Published: 16 January 2023.

Edited by:

Igor N. Berezovsky, Bioinformatics Institute (A∗STAR), Singapore

Reviewed by:

Marianne Rooman, Université libre de Bruxelles, Belgium
Michał Koliński, Mossakowski Medical Research Institute, Polish Academy of Sciences, Poland

Copyright © 2023 Tufféry and Derreumaux. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Pierre Tufféry, cGllcnJlLnR1ZmZlcnlAdS1wYXJpcy5mcg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

A refined pH-dependent coarse-grained model for peptide structure prediction in aqueous solution

1 Introduction

2 Methods

2.1 The sOPEP2 force field

2.2 Debye-Hückel charge interactions

2.3 PEP-FOLD, TrRosetta and AlphaFold2 protocols and analysis

3 Results and discussion

3.1 Predicted models of poly-charged peptides

3.2 Predicted models of polypeptides with NMR structures

3.3 Predicted models of polypeptides without any NMR structures

4 Conclusion

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

References

94% of researchers rate our articles as excellent or good

94% of researchers rate our articles as excellent or good