- 1NaturalAntibody, Kraków, Poland
- 2Medical University of Warsaw, Warsaw, Poland
- 3Medical University of Lodz, Lodz, Poland
- 4Alchemab Therapeutics Ltd., London, United Kingdom
- 5UCB, Slough, United Kingdom
AlphaFold2 has hallmarked a generational improvement in protein structure prediction. In particular, advances in antibody structure prediction have provided a highly translatable impact on drug discovery. Though AlphaFold2 laid the groundwork for all proteins, antibody-specific applications require adjustments tailored to these molecules, which has resulted in a handful of deep learning antibody structure predictors. Herein, we review the recent advances in antibody structure prediction and relate them to their role in advancing biologics discovery.
1 Introduction
Antibodies are the largest class of biotherapeutics, with more than 100 approved molecules (Kaplon et al., 2023). The antibody drug market is rapidly growing, and it is predicted to reach approximately $300 billion by 2025 (Lu et al., 2020). As a result, there is much interest in streamlining antibody discovery methods by tapping into recent computational advances in deep learning.
One of the most striking computational advances has taken place in structure prediction, with the development of tools such as AlphaFold2 (Jumper et al., 2021). For antibodies, the determination of the proper antibody structure is key to many downstream drug discovery tasks, such as developability annotation (Raybould et al., 2019) or antibody–antigen docking (Krawczyk et al., 2014; Schneider et al., 2021). Though AlphaFold2 works well for general proteins, it falls short on the specific case of antibodies (Ruffolo et al., 2022a; Abanades et al., 2022b; Cohen et al., 2022), prompting the development of antibody-specific modeling protocols.
In this review, we describe the methods which contribute to the improvement of computational structure modeling for antibodies and provide context to the role they play in designing antibody-based therapeutics.
2 Antibody structure in the context of 3D modeling
Antibody structure prediction is primarily focused on the variable domains of the heavy chain (Vh) and the light chain (Vl) (Figure 1A). Each domain is relatively small, comprising ∼110 residues each. There are two major hurdles within the overall antibody structure prediction problem: determining the relative orientation of the two domains (Figure 1B) and predicting the complementarity-determining region (CDR) loop structures. The two domains can be juxtaposed differently, which affects the overall shape of the antibody binding site. For this reason, orientating the multimer of the heavy and light chains is crucial (Dunbar et al., 2013; Bujotzek et al., 2015).
FIGURE 1. Specifics of the antibody structure in the context of modeling. (A) Variable region in the context of the entire antibody structure. The antibody binding site is located in the variable region composed of the variable heavy (Vh) and variable light (Vl) polypeptide chains associated with the constant portions (HC/LC). (B) Heavy/light chain orientation. The orientation of the Vh and Vl is not constant, and differing angles can affect the shape of the binding site. (C) Canonical structures of CDRs. Most of the binding residues (the paratope) are found in the complementarity-determining regions (CDRs). There are three CDRs on each of the heavy and light chains. All the CDRs except the CDR-H3 cluster into a set of “canonical shapes” depending on residues in key positions. (D) Heterogeneity of CDR-H3. CDR-H3 is not only the most variable of the regions but also usually the most important for antigen binding.
The CDR prediction problem can be further subdivided into classifying the “canonical” CDRs (CDR-L1, CDR-L2, CDR-L3, CDRH1, and CDR-H2) or modeling the CDR-H3. The canonical CDRs have reasonably conserved folds (Nowak et al., 2016; Kelow et al., 2022) (Figure 1C). The latter problem is arguably the most difficult and critical, as the CDR-H3 is the most variable (Figure 1D), and also plays the major role in binding (Marks and Deane, 2017; Regep et al., 2017; Ruffolo et al., 2020; Abanades et al., 2022a).
There is a diversity of methods to approach any of these sub-problems individually, or predicting the entire multimeric gamut of variable domains. However, attention is often focused around CDR-H3 prediction accuracy given its central role in binding and function. Compilation of the available antibody structure prediction methods that leverage recent advances in machine learning are listed in Table 1.
TABLE 1. Compilation of the available antibody structure prediction methods that leverage recent advances in machine learning. For each method, we describe the general goal (e.g., CDR prediction or whole variable region prediction), the accuracy of the most difficult region, the CDR-H3, its code/server availability, and the source paper. Please note that the CDR-H3 root mean square deviations (RMSDs) are not directly comparable as they could have been obtained from a different test set and are sometimes calculated in a different fashion, e.g., based on Cα or main chain heavy atom positions. As a baseline and reference point, we also include the AlphaFold2 predictions since many methods report values with respect to that method.
3 Current machine learning methods tackling antibody structure prediction
3.1 What data fuel the models?
Antibody-based deep learning methods require antibody structures for training and validation which are typically downloaded from the Protein Data Bank (PDB). At the time of writing, there were approximately 7,000 redundant antibody structures. Although this sample of the antibody sequence space represents a small subset of all possible antibody molecules (>1015), it can still be used to model most naturally occurring antibodies (Krawczyk et al., 2018).
Databases such as AbDb (Ferdous and Martin, 2018) and SAbDab (Dunbar et al., 2014) curate such antibody-specific information. Most of the antibody structure prediction tools use these two resources that facilitate the creation of training, validation, and test datasets.
3.2 How is the antibody model quality assessed?
In the original AlphaFold2 work and CASP competition in general, the structural accuracy is calculated using GDT_TS (Kryshtafovych et al., 2021). This score is a measure of structural alignment between the model and native structure that is capable of indicating fold similarities. All antibodies are already of the same fold and one needs to account for differences in single loops (e.g., CDR-H3), where the RMSD is more suitable.
Methods that attempt the modeling of the entire variable region report the entire chain RMSD, further dividing it into the individual CDR RMSDs. Nevertheless, here, the gains in structure prediction accuracy are typically small as such predictions are already of very good quality, excluding the CDR-H3.
Since the CDR-H3 is the most difficult to predict, it is the benchmark point of reference for accuracy across different models. Methods typically report the RMSD of the prediction versus the native structure. RMSD can be calculated using two different approaches. Typically, RMSD is calculated based on the backbone atoms (N, C, CA, and O) or C-alpha (Cα) carbons only, with the latter always being lower. RMSD can also be calculated after aligning the entire variable region or only after aligning the CDR-H3 atoms. The latter method leads to a slightly lower reported RMSD, as it causes bias in the structural alignment for a better fit.
3.3 What methods and techniques are used for modeling individual antibody loops and individual chains?
Due to the importance of the CDR-H3 loop, many methods focus exclusively on modeling this region. For instance, DeepH3 and ABlooper were designed for CDR-H3 loop prediction, rather than addressing the entire variable region. DeepH3 is based on a residual network architecture that receives one-hot encoding of the sequence to be predicted as input. In terms of residual network size, it is much smaller than RaptorX (Källberg et al., 2012) on which it is based (6 1D + 60 2D) with only 3 1D + 25 2D blocks. It operates by predicting discretized inter-residue distances (assigning distances into equally spaced intervals) and orientation angles which are employed for full structure reconstruction by RosettaAntibody. In contrast, ABlooper is based on equivariant graph neural networks [E(G)NNs], which are equivariant to translations and rotations in 3D space (Satorras et al., 2021). ABlooper allows for coordinate uncertainty estimation by calculating the agreement between five independently trained neural network models. The chief advantage of ABlooper is speed, as it can produce hundreds of structures within seconds as opposed to previously available homology modeling methods that required around a minute.
Beyond CDR-H3-focused predictions, one needs to contextualize this loop to the rest of the heavy chain. One of the early machine learning models that could perform whole chain predictions is NanoNet. Originally designed as a predictor of single-chain antibodies (Deszyński et al., 2022), it can also predict heavy chains of canonical antibodies. Similar to DeepH3, it is a residual neural network (ResNet) that relies on one-dimensional convolutions to map sequence elements to three-dimensional coordinates. Unlike DeepH3, which operates on invariant features (residue distances and orientation angles), NanoNet operates on a single frame of reference by aligning all PDB heavy chains to a single reference structure. Owing to this, the predictions of the NanoNet are 3D coordinates, not requiring further translation into the structure as is the case with invariant features. In the context of the entire heavy chain prediction, authors report 2.38 Å accuracy for CDR-H3 (solutions in the region of 1 Å can be considered near-native). Similar to ABlooper, NanoNet is rapid, allowing for predicting thousands of structures in a matter of seconds. However, the predicted structures are often of bad physical quality [e.g., atomic clashes, D-amino acids, etc. (Fernández-Quintero et al., 2023)], requiring refinement.
3.4 What architectures and techniques are currently used to predict the entire antibody variable region structure?
Prediction of the entire variable region requires modeling and multimeric assembly of both heavy and light chains. Herein, DeepAb is a network that predicts discretized residue distances and orientation angle bins that are then passed for structure realization using Rosetta. The chief innovation of DeepAb is the usage of a language model as an input to the network. Employing embedding (internal efficient representations of input antibody sequences) for prediction offers an opportunity for the network to perform prediction on more efficient features extracted by the language model. Furthermore, the network employs attention mechanisms that allow tracking of which residues contribute to each other’s predictive signal.
Residual neural networks provide limited ways to abstract invariant three-dimensional information. Representing the entire variable region structure as a graph (as was the case with ABlooper) offers a solution to this problem. For instance, one can encode amino acids as nodes (using features such as amino acid and position) and draw edges between nodes/residues in proximity (e.g., within 8 Å heavy atom distance). Graph neural networks (GNNs) have increasingly gained popularity; this is hallmarked by ABlooper, IgFold, and EquiFold. The authors of EquiFold employed a coarse-grained representation for nodes to demonstrate its power within the framework of a SE (3) (special Euclidean (3) group ensuring rotation and translation equivariance) equivariant network. Ensuring geometric equivariance helps the network in learning features that can be rotated and translated. A more abstract representation using quaternions and Euler angles to encode the amino acids as invariant representations and as an extension of RefineGNN residues has been shown to achieve CDR-H3 predictions in the region of 2.5 Å. IgFold is another GNN-based method that also employs embeddings from AntiBERTy, which is trained on 500-m antibody sequences to supplement its prediction of the entire variable region.
Three key components have contributed to the success of AlphaFold2: the Evoformer, invariant point attention (IPA), and recycling. First, AlphaFold2 infers spatial constraints between amino acids by extracting evolutionary information embedded within multiple sequence alignments (MSAs) using its Evoformer module. In parallel, this information is fed into a structural module that leverages IPA to predict coordinates. IPA is a novel attention mechanism designed to be invariant to rotations and translations by aligning the feature vectors based on the geometric relationship between the residues without changing their 3D positions. It has been shown that it improves the accuracy of protein structure prediction by enabling the network to better capture the complex spatial relationships between residues in a protein. Finally, the whole workflow is repeated or “recycled” three times to refine the prediction.
While AlphaFold2 was designed for predicting any arbitrary protein sequence, its main components have influenced the design of antibody-specific tools. There are variations in the implementation of each of the aforementioned three components. For example, IgFold uses separated weights for each IPA layer and gradient propagation through rotations. xTrimoFoldAb and tfold-Ab use language model embeddings to replace the Evoformer, before applying the learned constraints into the structural module. Other methods, such as ABodyBuilder2, demonstrated that one can use only the structural module without resorting to antibody-specific embeddings or modified Evoformers. The antibody-focused methods are more accurate than AlphaFold2, but these improvements are limited. One major advantage of antibody-specific methods is their efficient running time. For instance, ABodyBuilder2 achieves predictions in a matter of seconds, compared to tens of minutes for AlphaFold2. AlphaFold2’s running time is comparatively long because of the MSA search step, which is unnecessary for antibody-specific methods.
The loss function drives the training of a model as it penalizes wrong predictions and rewards better ones. It is extremely important as one of the chief innovations of AlphaFold2 was the introduction of the frame aligned point error (FAPE) loss. This component exposes the model to information related to physicochemical constraints, such as proper chemical bond distances and angles, as well as penalizing atom clashes and other structural violations, and is also used in some of the antibody-specific models. However, because of the skewed difficulty in structure prediction, applying the same loss to each antibody region is not an ideal approach. For instance, xTrimoABFold employs focal loss focused on CDRs that are more difficult to predict. On the other hand, ABodyBuilder2 treats framework and CDR regions differently, clamping framework regions at a FAPE loss of 30 Å and the CDRs’ FAPE loss at 10 Å.
3.5 How do networks approach fine-structural details beyond the backbone?
Despite the progress in predictions, a seemingly trivial problem faced by the networks is the physical plausibility of the produced models (Fernández-Quintero et al., 2022). It was observed in AlphaFold2 that the structure module can produce predictions violating physical constraints, such as atomic clashes. This is not only a problem of AlphaFold2-based methods, and methods such as NanoNet and EquiFold are also afflicted. Methods such as ABodyBuilder2 and IgFold employed OpenMM and Rosetta, respectively, to reduce the number of physical clashes in the model produced by a neural network. The number of non-physical distances can also be reduced by introducing various physical constraints at the training time (Eguchi et al., 2022; Kończak et al., 2022).
Although structure prediction is typically evaluated on its ability to recapitulate the backbone, the determination of the side chains is important for fine-grained modeling of molecular function, such as binding affinity. Methods such as ABodyBuilder2 and IgFold produce the backbone structures annotated with side chains. Other methods such as EquiFold use a novel coarse-graining scheme where atoms are mapped to coarse-grained “superatom” prior to structural modeling and then reverse-mapped to the individual atoms once the backbone is constructed (Akpinaroglu et al., 2022). Other methods such as NanoNet only produce the backbone. Side chains are typically added by algorithms such as SCWRL (Krivov et al., 2009) or PEARS (Leem et al., 2018), but recently an antibody-specific side chain prediction mechanism using convolutional neural networks has been introduced—DeepSCAb (Akpinaroglu et al., 2022). Altogether, fast and accurate prediction of all-atom models is key to using the antibody structures for practical drug discovery purposes.
4 Drug discovery perspective of antibody structure prediction
Antibodies are a well-established drug format, with the structure as a key component in aiding their discovery and development, paving ground for real-world applications of 3D modeling.
Antibody structures provide rich information that can be used to improve various prediction features, such as molecular recognition (Oh et al., 2021), liability detection (Irudayanathan et al., 2022), or developability screening (Jain et al., 2023). These models can complement wet-lab antibody discovery methods, such as immunization or phage display, to ultimately improve the selection of binders. For instance, the identification of antigen-specific antibodies was typically tackled using clonotype/sequence clustering methods. Machine learning has shown alternative ways to group these molecules such as by embeddings from variational autoencoders (VAEs) (Friedensohn et al., 2020), predicting paratope residues (Richardson et al., 2021), or clustering structures (Robinson et al., 2021). In particular, structural clustering can provide a highly translational interpretation of antibody binding. The methods described in this Review are highly scalable, making it possible to group thousands of structures.
The optimization of biologics is the process of improving an existing molecule, which already displays a variety of desirable properties, with regard to a set of physicochemical properties. Structural features can be employed to guide the optimization process. A trivial example would be to focus existing liability removal (e.g., deamidation) protocols on only surface-exposed residues, which can be identified based on a reasonably accurate three-dimensional model (Irudayanathan et al., 2022). Structural features can be indicative of successful therapeutics (Raybould et al., 2019; Ahmed et al., 2021), with some differences in the calculated results based on the underlying modeling method (Jain et al., 2023). In some cases, such as antibody–antigen docking, good quality models are needed to reach the results achieved by docking crystal structures (Schneider et al., 2021).
The most ambitious use of antibody structure prediction is for the de novo antibody design, where the goal is to computationally define an antibody sequence that can bind to a given target epitope. One approach to the de novo design that relies on structural predictions is “virtual screening,” a methodology that has been practiced in small molecule drug discovery for decades but has only been recently applied to antibodies. This can involve the modeling of and selection from millions of antibody molecules, which are then funneled into a molecular docking approach (Schneider et al., 2021; Jin et al., 2022) or alternative binding site design methods (Rangel et al., 2022). The quality of the models is a key consideration as subtle changes in Vh/Vl arrangement, backbone, or side chain orientation can affect the quality of the predictions (Fernández-Quintero et al., 2022). In addition, any such efforts hinge on linking the antibody structural predictions to paratope–epitope interaction prediction. In this context, “zero-shot” predictions require the models to propose sequences binding a specific epitope without observing it, or any close variants of it, in the training/test sets.
Another approach to the de novo design is using generative methods. Herein, the latent space of the input (e.g., antibody sequences) is learned, providing a way to sample novel elements. Producing novel sequences based on transformer models has already been shown in general proteins (Rives et al., 2021) as well as in the antibody world (Melnyk et al., 2021; Saka et al., 2021; Shin et al., 2021; Shuai et al., 2021). Autoregressive methods such as IgLM (Shuai et al., 2021) offer a way to generate new binder sequences based on millions of sequences from natural repertoires. Such generation can also be biased toward sequences with certain biophysical properties by GANs (Amimeur et al., 2020). Most such methods, however, are currently sequence-driven but not structure-driven.
Structure holds the potential to provide more information than sequence alone (Kovaltsuk et al., 2017). Encoding the structural space, in the form of torsional angles using VAEs, has shown potential in generating novel 3D shapes (Eguchi et al., 2022). Leveraging structural information for generating paratopes to specific antigens should produce better results than using sequence alone (Jin et al., 2022). Higher quality structural models have the potential to inform better structure-generation methods, leading to more accurate emulation of molecular space than sequence alone. Embeddings generated by the inverse-folding of general proteins have already shown potential to be useful for B-cell epitope prediction (Hsu et al., 2022; Høie et al., 2023).
In the context of structure-conditioned generative methods, RefineGNN, AbDockGen, AbBERT-HMPN, and DiffAb go a step further than the modeling methods described in this review. They also provide a “compatibility” score for the structure and designed sequence. RefineGNN, AbDockGen, and AbBERT-HMPN are based on the iterative refinement of latent representations from graph neural networks, whereas DiffAb samples via a denoising diffusion model. The integration of structure prediction and sequence design is the next intuitive step superseding structure prediction, which holds the promise to enhance antibody-based drug discovery.
5 Conclusion
Advances in protein structure prediction have practical application in the discovery of new antibody drugs.
In general, accuracy increasing with respect to the pioneer in ML-based accurate structure prediction, AlphaFold2, is noticeable, but stay within an order of magnitude. Predictions of the CDR-H3 structure in particular appear to be “stuck” in the 2–3 Å heavy atom backbone RMSD interval. Difficulty in the prediction of CDR-H3 conformation could stem from the loops’ flexibility (Wong et al., 2011; Fernández-Quintero et al., 2018; Jeliazkov et al., 2018) as well as the possible influence of the Vh/Vl orientation (Marze et al., 2016; Boucher et al., 2023). With only several thousand antibody structures at hand (Dunbar et al., 2014), it is challenging to study any flexibility or allosteric effects, but perhaps with a larger number of better quality cryo-EM structures we will increase the volume of structural information available. Efforts in improving antibody structure prediction might take the flexibility into account by scoring the CDR-H3 conformational ensemble rather than single “best structure” produced.
The main advantage of the antibody-specific methods with respect to AlphaFold2 is the speed. The antibody sequence space in a single individual [∼109–1011 (Briney et al., 2019)] easily surpasses the human proteome (∼20 k). The speed of antibody modeling methods is of utmost importance, as it directly translates to the mapping of the available antibody sequence space (Kovaltsuk et al., 2018; Olsen et al., 2022), antibody virtual screening (Schneider et al., 2021; Rangel et al., 2022), and the development of novel generative models (Eguchi et al., 2022).
Given the number of currently available antibody-specific structure predictions, it might be suitable to take stock of the state of the field and devote efforts into benchmarking the different methods as was the case with the two rounds of the Antibody Modeling Assessment competition (Almagro et al., 2011; Almagro et al., 2014). In the field of antibody discovery specifically, we could use the tools not only to test by a single measure of RMSD but also to assess how useful the structural predictions are for therapeutically minded tasks, such as lead optimization, docking, epitope, or paratope prediction.
Altogether the accuracy, speed, and accessibility of the current antibody modeling methods make it possible to apply structural information to various aspects of biologics discovery pipelines today. An incremental improvement to existing discovery approaches using structure-guided computational methods appears entirely feasible, while the field continues to move ever forward toward the “holy grail” of the true de novo antibody design.
Author contributions
IJ, WB, and KK contributed to the conception and design of the study. IJ and WB wrote the first draft of the manuscript. IJ, WB, PD, TS, JK, WW, BJ, SW, DC, JG, JL, SK, and KK wrote sections of the manuscript. TG prepared the figures. All authors contributed to the article and approved the submitted version.
Funding
This work was co-financed by the European Regional Development Fund within the Smart Growth Operational Programme 2014–2020 POIR.01.01.01-00-0962/21 (to KK).
Conflict of interest
IJ, WB, TG, PD, TS, JK, WW, BJ, SW, DC, and KK are employees of NaturalAntibody that develops data, software and machine learning solutions for the therapeutic antibody industry. JG and JL are employees of Alchemab. SK is an employee of UCB Pharma.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Abanades, B., Georges, G., Alexander, B., and Deane, C. M. (2022a). ABlooper: Fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics 38 (7), 1877–1880. doi:10.1093/bioinformatics/btac016
Abanades, B., Wong, W. K., Boyles, F., Georges, G., Alexander, B., and Charlotte, M. D. (2022b). ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. Available at: https://www.biorxiv.org/content/10.1101/2022.11.04.514231v1 (Accessed November 4, 2022).
Ahmed, L., Gupta, P., Martin, K. P., Scheer, J. M., Nixon, A. E., and Kumar, S. (2021). Intrinsic physicochemical profile of marketed antibody-based biotherapeutics. Proc. Natl. Acad. Sci. 118 (37), 577118. doi:10.1073/pnas.2020577118
Akpinaroglu, D., Ruffolo, J. A., Mahajan, S. P., and Gray, J. J. (2022). Simultaneous prediction of antibody backbone and side-chain conformations with deep learning. PloS One 17 (6), 0258173. doi:10.1371/journal.pone.0258173
Almagro, J. C., Beavers, M. P., Hernandez-Guzman, F., Maier, J., Shaulsky, J., Butenhof, K., et al. (2011). Antibody modeling assessment. Proteins 79 (11), 3050–3066. doi:10.1002/prot.23130
Almagro, J. C., Teplyakov, A., Luo, J., Sweet, R. W., Kodangattil, S., Hernandez-Guzman, F., et al. (2014). Second antibody modeling assessment (AMA-II). Proteins 82 (8), 1553–1562. doi:10.1002/prot.24567
Amimeur, T., Shaver, J. M., Ketchem, R. R., Taylor, J. A., Clark, R. H., Smith, J., et al. (2020). Designing feature-controlled humanoid antibody discovery libraries using generative adversarial networks. Available at: https://www.biorxiv.org/content/10.1101/2020.04.12.024844v1 (Accessed April 13, 2020).
Boucher, L. E., Prinslow, E. G., Feldkamp, M., Yi, F., Nanjunda, R., Wu, S.-J., et al. (2023). ‘Stapling’ scFv for multispecific biotherapeutics of superior properties. mAbs 15 (1), 2195517. doi:10.1080/19420862.2023.2195517
Briney, B., Inderbitzin, A., Joyce, C., and Burton, D. R. (2019). Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature 566 (7744), 393–397. doi:10.1038/s41586-019-0879-y
Bujotzek, A., Dunbar, J., Lipsmeier, F., Schäfer, W., Antes, I., Deane, C. M., et al. (2015). Prediction of VH-vl domain orientation for antibody variable domain modeling. Proteins 83 (4), 681–695. doi:10.1002/prot.24756
Cohen, T., Halfon, M., and Schneidman-Duhovny, D. (2022). NanoNet: Rapid and accurate end-to-end nanobody modeling by deep learning. Front. Immunol. 13, 958584. doi:10.3389/fimmu.2022.958584
Deszyński, P., Młokosiewicz, J., Adam, V., Jaszczyszyn, I., Castellana, N., Bonissone, S., et al. (2022). INDI—Integrated nanobody database for immunoinformatics. Nucleic Acids Res. 50 (1), D1273–D1281. doi:10.1093/nar/gkab1021
Dunbar, J., Fuchs, A., Shi, J., and CharlotteDeane, M. (2013). ABangle: Characterising the VH-VL orientation in antibodies. Protein Eng. Des. Sel. PEDS 26 (10), 611–620. doi:10.1093/protein/gzt020
Dunbar, J., Krawczyk, K., Leem, J., Baker, T., Fuchs, A., Georges, G., et al. (2014). SAbDab: The structural antibody database. Nucleic Acids Res. 42, D1140–D1146. doi:10.1093/nar/gkt1043
Eguchi, R. R., Choe, C. A., and Huang, P.-S. (2022). Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18 (6), 1010271. doi:10.1371/journal.pcbi.1010271
Ferdous, S., and Martin, A. C. R. (2018). AbDb: Antibody structure database-a database of PDB-derived antibody structures. Database J. Biol. Databases Curation 2018, bay040. doi:10.1093/database/bay040
Fernández-Quintero, M. L., Kokot, J., Waibl, F., Fischer, A. M., Quoika, P. K., Deane, C. M., et al. (2023). Challenges in antibody structure prediction. mAbs 15 (1), 2175319. doi:10.1080/19420862.2023.2175319
Fernández-Quintero, M. L., Kokot, J., Franz, W., Fischer, A.-L. M., Quoika, P. K., Deane, C. M., et al. (2022). Challenges in antibody structure prediction. Available at: https://www.biorxiv.org/content/10.1101/2022.11.09.515600v1 (Accessed November 9, 2022).
Fernández-Quintero, M. L., Loeffler, J. R., Kraml, J., Kahler, U., Kamenik, A. S., and Liedl, K. R. (2018). Characterizing the diversity of the CDR-H3 loop conformational ensembles in relationship to antibody binding properties. Front. Immunol. 9, 3065. doi:10.3389/fimmu.2018.03065
Friedensohn, S., Neumeier, D., Khan, T. A., Csepregi, L., Parola, C., Arthur, R. G. D. V., et al. (2020). Convergent selection in antibody repertoires is revealed by deep learning. Available at:https://www.biorxiv.org/content/10.1101/2020.02.25.965673v1 (Accessed February 26, 2020).
Gao, K., Wu, L., Zhu, J., Peng, T., Xia, Y., Liang, H., et al. (2022). Incorporating pre-training paradigm for antibody sequence-structure Co-design. Available at: https://arxiv.org/abs/2211.08406 (Accessed October 26, 2022).
Høie, M. H., Gade, F. S., Johansen, J. M., Würtzen, C., Winther, O., Nielsen, M., et al. (2023). DiscoTope-3.0 - improved B-cell epitope prediction using AlphaFold2 modeling and inverse folding latent representations. Available at: https://www.biorxiv.org/content/10.1101/2023.02.05.527174v1 (Accessed February 5, 2023).
Hsu, C., Verkuil, R., Liu, J., Lin, Z., Hie, B., Tom, S., et al. (2022). Learning inverse folding from millions of predicted structures. Available at: https://www.biorxiv.org/content/10.1101/2022.04.10.487779v1 (Accessed April 10, 2022).
Irudayanathan, F. J., Zarzar, J., Lin, J., and Izadi, S. (2022). Deciphering deamidation and isomerization in therapeutic proteins: Effect of neighboring residue. mAbs 14 (1), 2143006. doi:10.1080/19420862.2022.2143006
Jain, T., Todd, B., and Vásquez, M. (2023). Identifying developability risks for clinical progression of antibodies using high-throughput in vitro and in silico approaches. mAbs 15 (1), 2200540. doi:10.1080/19420862.2023.2200540
Jeliazkov, J. R., Sljoka, A., Kuroda, D., Tsuchimura, N., Katoh, N., Tsumoto, K., et al. (2018). Repertoire analysis of antibody CDR-H3 loops suggests affinity maturation does not typically result in rigidification. Front. Immunol. 9, 413. doi:10.3389/fimmu.2018.00413
Jin, W., Barzilay, D. R., and Jaakkola, T. (2022). “Antibody-antigen docking and design via hierarchical structure refinement,” in Proceedings of the 39th International Conference on Machine Learning, Baltimore, MA, July 17-23, 2022 (Proceedings of Machine Learning Research. PMLR).
Jin, W., Wohlwend, J., Barzilay, R., and Jaakkola, T. (2021). Iterative refinement graph neural network for antibody sequence-structure Co-design. Available at: http://arxiv.org/abs/2110.04624 (Accessed October 9, 2021).
Jumper, J., Evans, R., Alexander, P., Green, T., Figurnov, M., Ronneberger, O., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature 596 (7873), 583–589. doi:10.1038/s41586-021-03819-2
Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., et al. (2012). Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 7 (8), 1511–1522. doi:10.1038/nprot.2012.085
Kaplon, H., Crescioli, S., Visweswaraiah, J., and Reichert, J. M. (2023). Antibodies to watch in 2023. mAbs 15 (1), 2153410. doi:10.1080/19420862.2022.2153410
Kelow, S., Faezov, B., Xu, Q., Parker, M., Adolf-Bryfogle, J., and Roland, L. D. (2022). A penultimate classification of canonical antibody CDR conformations. Available at: https://www.biorxiv.org/content/10.1101/2022.10.12.511988v1 (Accessed October 16, 2022).
Kończak, J., Janusz, B., Młokosiewicz, J., Satława, T., Wróbel, S., Dudzic, P., et al. (2022). Structural pre-training improves physical accuracy of antibody structure prediction using deep learning. Available at: https://www.biorxiv.org/content/10.1101/2022.12.06.519288v1 (Accessed December 9, 2022).
Kovaltsuk, A., Krawczyk, K., Galson, J. D., Kelly, D. F., Deane, C. M., and Trück, J. (2017). How B-cell receptor repertoire sequencing can Be enriched with structural antibody Data. Front. Immunol. 8, 1753. doi:10.3389/fimmu.2017.01753
Kovaltsuk, A., Leem, J., Kelm, S., James, S., Deane, C. M., and Krawczyk, K. (2018). Observed antibody space: A resource for Data mining next-generation sequencing of antibody repertoires. J. Immunol. 201 (8), 2502–2509. doi:10.4049/jimmunol.1800708
Krawczyk, K., Kelm, S., Kovaltsuk, A., Galson, J. D., Kelly, D., Trück, J., et al. (2018). Structurally mapping antibody repertoires. Front. Immunol. 9, 1698. doi:10.3389/fimmu.2018.01698
Krawczyk, K., Liu, X., Baker, T., Shi, J., and CharlotteDeane, M. (2014). Improving B-cell epitope prediction and its application to global antibody-antigen docking. Bioinformatics 30 (16), 2288–2294. doi:10.1093/bioinformatics/btu190
Krivov, G. G., Shapovalov, M. V., and Dunbrack, R. L. (2009). Improved prediction of protein side-chain conformations with SCWRL4. Proteins 77 (4), 778–795. doi:10.1002/prot.22488
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K., and Moult, J. (2021). Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 89 (12), 1607–1617. doi:10.1002/prot.26237
Lee, J. H., Yadollahpour, P., Watkins, A., Frey, N. C., Leaver-Fay, A., Stephen, R., et al. (2022). EquiFold: Protein structure prediction with a novel coarse-grained structure representation. Available at: https://www.biorxiv.org/content/10.1101/2022.10.07.511322v1 (Accessed October 8, 2022).
Leem, J., Georges, G., Shi, J., and CharlotteDeane, M. (2018). Antibody side chain conformations are position-dependent. Proteins 86 (4), 383–392. doi:10.1002/prot.25453
Lu, R.-M., Hwang, Y.-C., Liu, I -J., Lee, C.-C., Tsai, H. Z., Li, H. J., et al. (2020). Development of therapeutic antibodies for the treatment of diseases. J. Biomed. Sci. 27 (1), 1. doi:10.1186/s12929-019-0592-z
Luo, S., Su, Y., Peng, X., Wang, S., Peng, J., and Ma, J. (2022). Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Available at: https://www.biorxiv.org/content/10.1101/2022.07.10.499510v1 (Accessed July 11, 2022).
Marks, C., and Deane, C. M. (2017). Antibody H3 structure prediction. Comput. Struct. Biotechnol. J. 15, 222–231. doi:10.1016/j.csbj.2017.01.010
Marze, N. A., Lyskov, S., and Gray, J. J. (2016). Improved prediction of antibody VL-VH orientation. Protein Eng. Des. Sel. PEDS 29 (10), 409–418. doi:10.1093/protein/gzw013
Melnyk, I., Das, P., Chenthamarakshan, V., and Lozano, A. (2021). Benchmarking deep generative models for diverse antibody sequence design. Available at:http://arxiv.org/abs/2111.06801 (Accessed November 12, 2021).
Nowak, J., Baker, T., Georges, G., Kelm, S., Klostermann, S., Shi, J., et al. (2016). Length-independent structural similarities enrich the antibody CDR canonical class model. mAbs 8, 751–760. doi:10.1080/19420862.2016.1158370
Oh, L., Dai, B., and Bailey-Kellogg, C. (2021). “A multi-resolution graph convolution network for contiguous epitope prediction,” in Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics BCB ’21 38, Chicago, IL, August 7-10, 2022 (Association for Computing Machinery).
Olsen, T. H., Boyles, F., and Deane, C. M. (2022). Observed antibody space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. A Publ. Protein Soc. 31 (1), 141–146. doi:10.1002/pro.4205
Peng, C., Wang, Z., Zhao, P., Ge, W., and Huang, C. (2023). AbFold - an AlphaFold based transfer learning model for accurate antibody structure prediction. Available at: https://www.biorxiv.org/content/10.1101/2023.04.20.537598v1 (Accessed April 21, 2023).
Rangel Aguilar, M., Bedwell, A., Costanzi, E., Taylor, R. J., Russo, Rosaria, Bernardes, G. J. L., et al. (2022). Fragment-based computational design of antibodies targeting structured epitopes. Sci. Adv. 8 (45), eabp9540. doi:10.1126/sciadv.abp9540
Raybould, M. I. J., Marks, C., Krawczyk, K., Taddese, B., Nowak, J., Claire, M., et al. (2019). Five computational developability guidelines for therapeutic antibody profiling. Proc. Natl. Acad. Sci. U. S. A. 116 (10), 4025–4030. doi:10.1073/pnas.1810576116
Regep, C., Georges, G., Shi, J., Popovic, B., and Charlotte, M. D. (2017). The H3 loop of antibodies shows unique structural characteristics. Proteins 85 (7), 1311–1318. doi:10.1002/prot.25291
Richardson, E., Galson, J. D., Paul, K., Kelly, D. F., Anne Palser, S. E. S., Watson, S., et al. (2021). A computational method for immune repertoire mining that identifies novel binders from different clonotypes, demonstrated by identifying anti-pertussis toxoid antibodies. mAbs 13 (1), 1869406. doi:10.1080/19420862.2020.1869406
Rives, A., Meier, J., Tom, S., Goyal, S., Lin, Z., Liu, J., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. U. S. A. 118 (15), 239118. doi:10.1073/pnas.2016239118
Robinson, S. A., Raybould, M. I. J., Schneider, C., Wong, W. K., Marks, C., and Deane, C. M. (2021). Epitope profiling using computational structural modelling demonstrated on coronavirus-binding antibodies. PLoS Comput. Biol. 17 (12), 1009675. doi:10.1371/journal.pcbi.1009675
Ruffolo, J. A., Chu, L.-S., Mahajan, S. P., and Gray, J. J. (2022a). Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Available at: https://www.biorxiv.org/content/10.1101/2022.04.20.488972v1 (Accessed April 21, 2022).
Ruffolo, J. A., Guerra, C., Mahajan, S. P., Sulam, J., and Gray, J. J. (2020). Geometric potentials from deep learning improve prediction of CDR H3 loop structures. Bioinformatics 36 (1), i268–i275. doi:10.1093/bioinformatics/btaa457
Ruffolo, J. A., Sulam, J., and Gray, J. J. (2022b). Antibody structure prediction using interpretable deep learning. Patterns (New York, N.Y.) 3 (2), 100406. doi:10.1016/j.patter.2021.100406
Saka, K., Kakuzaki, T., Metsugi, S., Kashiwagi, D., Yoshida, K., Wada, M., et al. (2021). Antibody design using LSTM based deep generative model from phage display library for affinity maturation. Sci. Rep. 11 (1), 5852. doi:10.1038/s41598-021-85274-7
Satorras, V. G., Hoogeboom, E., and Welling, M. (2021). “E(n) equivariant graph neural networks,” in Proceedings of the 38th International Conference on Machine Learning, July 18-24, 2021 (Proceedings of Machine Learning Research. PMLR).
Schneider, C., Buchanan, A., Taddese, B., and CharlotteDeane, M. (2021). DLAB-deep learning methods for structure-based virtual screening of antibodies. Bioinformatics 38, 377–383. doi:10.1093/bioinformatics/btab660
Shin, J.-E., Riesselman, A. J., Kollasch, A. W., McMahon, C., Simon, E., Sander, C., et al. (2021). Protein design and variant prediction using autoregressive generative models. Nat. Commun. 12 (1), 2403–2411. doi:10.1038/s41467-021-22732-w
Shuai, R. W., Ruffolo, J. A., and Gray, J. J. (2021). Generative Language modeling for antibody design. Available at: https://www.biorxiv.org/content/10.1101/2021.12.13.472419v2 (Accessed December 20, 2022).
Son, Y.-H., Shin, D.-H., Han, J.-W., Won, S.-H., and Kam, T.-E. (2022). “GNN-based antibody structure prediction using quaternion and euler angle combined representation,” in 2022 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Yeosu, South Korea, October 26-28, 2022, 1–4.
Wang, Y., Gong, X., Li, S., Yang, B., Sun, Y., Shi, C., et al. (2022). xTrimoABFold: De novo antibody structure prediction without MSA. Available at: http://arxiv.org/abs/2212.00735 (Accessed November 30, 2022).
Wong, S. E., Sellers, B. D., and Jacobson, M. P. (2011). Effects of somatic mutations on CDR loop flexibility during affinity maturation. Proteins 79 (3), 821–829. doi:10.1002/prot.22920
Wu, J., Wu, F., Jiang, B., Liu, W., and Zhao, P. (2022). tFold-Ab: Fast and accurate antibody structure prediction without sequence homologs. Available at: https://www.biorxiv.org/content/10.1101/2022.11.10.515918v1 (Accessed November 13, 2022).
Keywords: deep learning, structural modeling, drug discovery, antibody therapeutics, antibody structure prediction
Citation: Jaszczyszyn I, Bielska W, Gawlowski T, Dudzic P, Satława T, Kończak J, Wilman W, Janusz B, Wróbel S, Chomicz D, Galson JD, Leem J, Kelm S and Krawczyk K (2023) Structural modeling of antibody variable regions using deep learning—progress and perspectives on drug discovery. Front. Mol. Biosci. 10:1214424. doi: 10.3389/fmolb.2023.1214424
Received: 29 April 2023; Accepted: 12 June 2023;
Published: 07 July 2023.
Edited by:
Traian Sulea, National Research Council Canada (NRC), CanadaReviewed by:
Monica Lisa Fernández-Quintero, University of Innsbruck, AustriaAndreas Evers, Merck, Germany
Copyright © 2023 Jaszczyszyn, Bielska, Gawlowski, Dudzic, Satława, Kończak, Wilman, Janusz, Wróbel, Chomicz, Galson, Leem, Kelm and Krawczyk. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Konrad Krawczyk, a29ucmFkQG5hdHVyYWxhbnRpYm9keS5jb20=
†These authors have contributed equally to this work and share first authorship