AI-accelerated therapeutic antibody development: practical insights

Santuari, Luca; Bachmann Salvy, Marianne; Xenarios, Ioannis; Arpat, Bulak

doi:10.3389/fddsv.2024.1447867

MINI REVIEW article

Front. Drug Discov., 03 September 2024

Sec. In silico Methods and Artificial Intelligence for Drug Discovery

Volume 4 - 2024 | https://doi.org/10.3389/fddsv.2024.1447867

AI-accelerated therapeutic antibody development: practical insights

Luca Santuari¹

Marianne Bachmann Salvy¹

Ioannis Xenarios^1,2

Bulak Arpat¹*

¹JSR Life Sciences, NGS-AI Division, Epalinges, Switzerland
²Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland

Antibodies represent the largest class of biotherapeutics thanks to their high target specificity, binding affinity and versatility. Recent breakthroughs in Artificial Intelligence (AI) have enabled information-rich in silico representations of antibodies, accurate prediction of antibody structure from sequence, and the generation of novel antibodies tailored to specific characteristics to optimize for developability properties. Here we summarize state-of-the-art methods for antibody analysis. This valuable resource will serve as a reference for the application of AI methods to the analysis of antibody sequencing datasets.

1 Introduction

Antibodies are the largest class of biotherapeutics, with a projected market size of US$ 300 Billion by 2025 (Lu et al., 2020). They are used for treating cancer, autoimmune and infectious diseases (Lu et al., 2020; Weiner et al., 2010; Chan and Carter, 2010), as they can be designed to recognize any antigen at high specificity and binding affinity. Antibody discovery is traditionally performed with directed evolution using experimental assays such as hybridoma or phage display (Lu et al., 2020). Although well-established, these methods remain costly, time-consuming and prone to fail due to experimental challenges.

The introduction of Next-Generation Sequencing (NGS) for antibody screening in place of random colony picking has enabled to cover a much larger sequence diversity, a wider binding affinity range, and isolate sequences that target distinct epitopes (Spoendlin et al., 2023). Short read sequencing is limited to a single chain, either heavy (VH) and light chain (VL), while long reads can obtain paired information of both chains, increasing our understanding of inter-chain residue dependencies (Burbach and Briney, 2024).

Recently, Artificial Intelligence (AI) has experienced accelerated progress, particularly in the fields of Deep Learning (DL) and Natural Language Processing (NLP), and biology has been greatly benefited from it (Khakzad et al., 2023; Graves et al., 2020; Nam Kim et al., 2024; Bender and Cortés-Ciriano, 2021; Bender and Cortes-Ciriano, 2021; Kim et al., 2023). A notable example is the model AlphaFold2 for structural biology (Jumper et al., 2021), which brought sequence-based protein structure prediction close to experimental accuracy.

The success of the Transformer architecture (Vaswani et al., 2023) in NLP has led to the creation of Large Language Models (LLM), statistical models trained on large collections of texts to capture semantic similarity among words in the form of vector representations, called embeddings, without relying on expensive and hard to obtain labels. Embeddings are very versatile, with applications that include text classification and generation. In biology, LLMs trained on curated databases of millions of protein sequences [UniProt (UniProt Consortium et al., 2023), UniRef (Suzek et al., 2007) and BFD (Jumper et al., 2021; Steinegger and Söding, 2018)] were shown to be able to learn secondary and tertiary structural information from sequence (Ahmed et al., 2022; Lin et al., 2023) and can be used to predict protein function. More recently LLMs have been trained on databases of antibody sequences, such as the Observed Antibody Space (OAS), leading to the creation of antibody-specific language models (ALMs) (Leem et al., 2022; Ruffolo et al., 2023; Prihoda et al., 2022; Olsen et al., 2022).

Despite the availability of these models, bringing an antibody from discovery to the patient remains challenging. Once a candidate antibody has been found, it must be optimized to match the properties of therapeutic antibodies, grouped under the term of developability. A consensus is lacking in the literature for which properties are part of developability (Habib et al., 2023; Raybould et al., 2024; Fernández-Quintero et al., 2023; Khetan et al., 2022; Zhang et al., 2023; Evers et al., 2023). Some of these properties are humanization, prediction of solubility and aggregation, for which several ML methods have been proposed (Prihoda et al., 2022; Parkinson and Wang, 2024; Pujols et al., 2022).

The type of license associated with a ML model (code, weights and training data) plays a key role in the choice of integration into industrial applications. A commercially permissive license favors rapid prototyping in research and development within an industrial setting and rapid, cost-free integration into a product. In this review, we indicate the license type associated to the methods presented, in the hope that this resource will serve as a reference to accelerate the adoption of these models in industry.

Several reviews have been published that discuss ML applications to antibody discovery and development (Graves et al., 2020; Nam Kim et al., 2024; Kim et al., 2023). These reviews are focused on giving an academic perspective on the field. Our review stands out not only for its breadth, by providing a comprehensive, up-to-date overview of the state-of-the-art AI methods and resources for antibody sequence, structure and developability, but also for the particular focus on practical considerations in regard to product integration, such as licenses.

Providing a comprehensive benchmark for these methods is outside the scope of this review, and would require testing against specific benchmark datasets like ProteinGym (Notin et al., 2023) for general protein language models and FlAb (Chungyoun et al., 2024) for antibody language models.

This review is divided into three parts (Figure 1). The first part covers recent applications of LLMs to protein (PLM) and antibody (ALM) sequences. The second part focuses on folding models, models that can predict protein structure from sequence, and inverse folding models, models that can identify the sequences a specific structure can fold into. The third part covers ML methods that can be used to optimize developability properties. Finally, we conclude with some remarks and perspectives.

Figure 1

Figure 1. Antibody sequence and structure information in relation to developability properties. Masked (contextual, BERT-like) and Causal (autoregressive, GPT-like) Language Modeling prediction strategies are highlighted. The residues underlined in red are the residues that are used to train the model to predict the masked residues (Masked) and next residue (Causal) indicated with a grey question mark. The representative antibody structure is the structure of immunoglobulin (PDB:1IGY). Arrows indicate the information flow from sequence to structure (folding models), from structure to sequence (inverse folding models) and that developability properties are determined by both antibody sequence and structure.

2 Antibody language models

The field of NLP was revolutionized by the introduction of the Transformer (Vaswani et al., 2017), a DL architecture that was able to achieve unprecedented accuracy in understanding and generation of written and spoken languages, programming languages, images and videos (Islam et al., 2023). At the core of the Transformer is the attention layer, a neural network layer inspired by cognitive attention, the human ability to focus on important signals and exclude irrelevant information. Through attention the model learns the relative importance all parts of the input sequence (tokens) have with respect to each other. This is used to generate a vector representation of each token in the sequence (embedding) that can be leveraged for specific tasks. Training is performed with either a Masked Language Modelling (MLM) objective, the prediction of a randomly chosen subset of masked tokens, a Causal Language Modelling (CLM) objective, the prediction of the next token based on the preceding tokens, or both (Figure 1).

When trained on large collections of protein sequences [UniProt (UniProt Consortium et al., 2023), UniRef (Suzek et al., 2007) and BFD (Steinegger and Söding, 2018)] as protein Language Models (PLMs) (Table 1), these models capture information on evolutionary constraints, secondary and tertiary structures (Ahmed et al., 2022; Lin et al., 2023; Rives et al., 2021). For a comprehensive overview of NLP applied to the protein sequence domain we refer to the available reviews (Ofer et al., 2021; Valentini et al., 2023; Dounas et al., 2024).

Table 1

Table 1. Specification for general protein language models (top) and antibody-specific language models (bottom). Base, base model architecture; Params, number of trainable parameters; Code, model and code availability; Training data, dataset used for training; License, release license; Refs, references; Year, year of first release. ProtTrans is a collection of models with base architecture Transformer-XL (Dai et al., 2019), XLNet (Yang et al., 2020), BERT (Jacob et al., 2019), Albert (Lan et al., 2019), Electra (Clark et al., 2020), T5 (Raffel et al., 2023). For disambiguation we refer to Baseline Antibody Language Model (BALM) as blBALM and to Bio-inspired Antibody Language Model (BALM) as bioBALM. GH: GitHub. HF: HuggingFace.

The concept of PLMs was later applied to antibody sequences, resulting in Antibody Language Models (ALM) (Table 1). Most of these models have been trained on unpaired data, either a single model including both chain types (AntiBERTa (Leem et al., 2022), AntiBERTy (Ruffolo et al., 2023), IgLM (Shuai et al., 2021), BALM-unpaired (Burbach and Briney, 2024), Bio-inspired Antibody Language Model (Jing et al., 2023)) or with chain-specific models [Sapiens (Prihoda et al., 2022), AbLang (Olsen et al., 2022)]. Other models, such as BALM-paired (Burbach and Briney, 2024), ESM2-paired (Burbach and Briney, 2024), SC-AIR-BERT (Zhao et al., 2023), and AbLang2 (Olsen et al., 2024a), make use of paired sequence information to capture inter-chain residue dependencies. Applications of these models include paratope prediction [AntiBERTa (Leem et al., 2022)), humanization (Sapiens (Prihoda et al., 2022)], sequence completion [AbLang (Olsen et al., 2022)] and generation conditioned on species and chain type [AntiBERTy (Ruffolo et al., 2023), IgLM (Shuai et al., 2021)]. pAbT5 (Simon et al., 2023) stands out as it is tasked to predict one chain type starting from the other.

3 Antibody folding and inverse folding

AlphaFold2 (Jumper et al., 2021) led to impressive improvements in the accuracy of protein sequence-to-structure prediction. One of major bottlenecks for the runtime of AlphaFold2 is the need to construct a Multiple Sequence Alignment (MSA) from the input sequence. Recently, structural information learned with PLMs has been leveraged to substitute the MSA dependency leading to the release of sequence-only models (ESMFold (Lin et al., 2023), BALMFold (Jing et al., 2023), OmegaFold (Wu Ruidong et al., 2022), HelixFold-Single (Fang et al., 2023) and EMBER3D (Weissenow et al., 2022)). Barret and coauthors (Barrett et al., 2022) compared an AlphaFold architecture using either only MSA or sequence (MonoFold) or both inputs together (PolyFold) and showed that the two input modes are complementary to each other, although using MSA has still higher performance. The reliance of AlphaFold2 on the MSA is also reflected in the lower accuracy when predicting the structure of orphan and de novo proteins. Both RGN2 (Chowdhury et al., 2021) and trRosettaX-Single (Wang et al.) have been proposed to address this limitation. More recently several models have been published addressing the structure prediction at the atomic level [Protpardelle (Chu et al., 2023), EquiFold (Lee et al., 2022), RoseTTAFold All-Atom (Krishna et al., 2024)] instead of the amino acid level, opening up new possibilities for modelling protein complexes with DNA, RNA, and small molecules. This is also the focus of AlphaFold3 (Abramson et al., 2024).

The prediction of antibody structure carries additional challenges with respect to other proteins. The Complementary Determining Regions (CDRs) responsible for the binding with the antigen are the most variable and therefore difficult to predict, especially the CDR3 region of the heavy chain (HCDR3). Several models have been published to address these challenges [ABlooper (Brennan et al., 2022), IgFold (Ruffolo et al., 2023), EquiFold (Lee et al., 2022), DeepAB (Ruffolo et al., 2022), ABodyBuilder2 (Brennan et al., 2023)]. ABodyBuilder2 is part of the ImmuneBuilder (Brennan et al., 2023) suite and has better performance with respect to ABlooper, IgFold, EquiFold and AlphaFold-Multimer (Evans et al., 2021), specifically for the prediction of the HCDR3 loops, achieving a RMSD of 2.81 Å. This improvement was achieved by using an ensemble of four models built on the structure module of AlphaFold-Multimer followed by refinement with OpenMM and pdbfixer. tFold-Ab (Wu Jiaxiang et al., 2022) first computes single chain structure predictions using the PLM ProtXLNet (Ahmed et al., 2022) and then predicts the multimer conformation of the heavy and light chains using a simplified version of the Evoformer module of AlphaFold that takes single sequence in input. However, the availability of this method only as a web-server hinders the possibility to assess its performance with respect to available benchmarks.

DL has been recently applied to the inverse folding problem, that is the problem of determining which sequences can fold into a predefined structure. This is especially useful in the context of protein and antibody design. For instance, the structure for a particular antibody sequence can be first derived with folding models, further optimized in structure space for developability properties and then converted back into sequence format for experimental validation. Inverse folding models for general proteins include ESM-IF1 (Hsu et al., 2022), KW-Design (Gao et al., 2024), ProRefiner (Zhou et al., 2023), GraDe_IF (Yi et al., 2023), ProteinMPNN (Dauparas et al., 2023) and SeqPredNN (Adriaan Lategan et al., 2023). Inverse folding methods specifically designed for antibodies are AntiFold (Haraldson Høie et al., 2024), AbMPNN (Dreyer et al., 2023), IgDesign (Shanehsazzadeh et al., 2023) and DiscoTope-3.0 (Haraldson Høie et al., 2024). AntiFold is a version of ESM-IF1 fine-tuned on experimental and predicted antibody structures.

Table 2 summarizes the information of the models mentioned in this section with the respective licenses.

Table 2

Table 2. Specification for general protein folding models (top), antibody-specific folding models (middle) and inverse folding models (bottom). DeepAb and ABodyBuilder2 are ensemble of models. The table follows the same structure as Table 1, apart from the Base column that does not apply here. The models for which we could not determine the number of parameters are indicated as NA in the Params column. GH: GitHub. HF: HuggingFace.

4 Developability

Screening for a high affinity antibody is only the first step in the antibody development process. To match the characteristics of therapeutic antibodies, the selected antibody must be further optimized to adhere to the properties of therapeutic antibodies (developability) (Fernández-Quintero et al., 2023; Khetan et al., 2022). Raybould and coauthors (Raybould et al., 2024) developed the Therapeutic Antibody Profile, a webserver used to evaluate antibody developability as including immunogenicity, solubility, specificity, stability, manufacturability, and storability. They focused on five metrics calculated from the CDRs based on total length, surface hydrophobicity, positive and negative charge of surface patches, and net charge of VH and VL chains.

Habib and coauthors (Habib et al., 2023) have compiled a list of 40 sequence- and 46 structure-based developability parameters (DP). They showed that sequence DPs are better predictors than structure DPs in single DP ablation experiments using Multiple Linear Regression (MLR) layer, especially when using sequence-based embeddings generated with the PLM ESM-1v as features. This reflects the fact that ESM-1v has direct access to the sequence information it is trained on and only learns structure information indirectly from sequence.

For the scope of this review, we will focus on ML methods for humanization and prediction of solubility, Methods for aggregation predictions are not mentioned, and viscosity. Humanization is the process of lowering the risk of immunogenicity by increasing the human-like content of the antibody sequence while maintaining binding affinity (Carter and Rajpal, 2022). Solubility, aggregation, and viscosity are properties that determine if an antibody will perform well in a solution. These are properties that are important for sub-cutaneous delivery, which represent an attractive alternative with respect to intra-venous delivery because of ease and speed of administration (Viola et al., 2018) but require maximizing the dose (Jiskoot et al., 2022). If an antibody has been selected in a screening procedure that uses the naive immune repertoire of a non-human species, when administered to a patient it can elicit an immunogenic response, whereby Anti-Drug Antibodies (ADAs) are raised against the engineered antibody by the immune system of the host. Humanization is the process by which non-human residues lying outside of the epitope-binding regions are iteratively swapped with human-like residues to increase the humanness of the antibody while retaining its binding affinity. Several ML methods have been presented to address humanization. Hu-mAb (Marks et al., 2021) uses a set of V gene type-specific Random Forest (RF) models to iteratively select the top scoring single-site mutation in the framework region based on a humanness score until it reaches a target score.

BioPhi (Prihoda et al., 2022) is a platform for humanness evaluation and humanization. Humanness is evaluated using OASis, a database of 9-mers (k-mers of nine residues) constructed from over 188 million sequences from 231 human subjects comprising 26 studies. Humanization is performed with Sapiens, which comprises two chain-specific ALMs, one for the heavy and one for the light chain, trained using a MLM objective on 20 million heavy chain human sequences from 38 OAS studies from 2011 to 2017, and 19 million light chain human sequences from 14 OAS studies from 2011 to 2017. The Sapiens network returns per-position posterior probabilities for all 20 amino acids conditioned on the input sequence that are used to introduce humanizing mutations. BioPhi includes an interface (Designer) that allows to select which of the suggested mutations the user would like to introduce. The software is available as a web-server or with a command line interface for processing set of sequences.

An alternative method not based on DL is AntPack (Parkinson and Wang, 2024). First, the authors developed a new antibody numbering method that is much faster than existing methods. Then they fitted a gaussian mixture model on the numbered antibody sequences using 60 million heavy and 70 million light sequences from the cAbRep database (Guo et al., 2019). The authors originally trained the model on human sequences of the OAS dataset and by inspecting the sequences of the training set that were responsible of giving unusually high probability to mouse sequences were able to identify more than 7,000 sequences that had been incorrectly labelled as both mouse and human.

Therapeutic antibodies are often produced and utilized at high concentration, so they require high solubility and low aggregation. Several methods have been proposed to predict solubility from sequence-based and structure-based features. SOLart (Hou et al., 2020) is a Random Forest model trained on a combination of 52 sequence-based and structure-based features. In a comparison with 9 SOTA methods using a dataset of experimentally determined and modelled structures of S. cerevisiae it was able to achieve a Pearson correlation of 0.65. Language models have been recently employed for binary prediction of soluble versus non soluble proteins, either by using fine-tuning or by training only the last classification layer of the neural network. NetSolP (Thumuluri et al., 2022), an ensemble of fine-tuned ESM1b models, achieves a performance comparable to the version of ESM with MSA input on datasets of proteins expressed in E. coli that were assessed for solubility.

Another favorable property related to developability for antibodies to perform well in a solution is low viscosity. Rai et al. (2023) proposed PfAbNet-viscosity, a 3D convolutional neural network to predict viscosity from antibody structures trained under a low training data regime. The authors used data augmentation to try to mitigate the limitations of working with few antibodies for training. PfAbNet-viscosity outperformed two SOTA models, Sharma (Sharma et al., 2014) and Surface Charge Model (SCM) (Agrawal et al., 2015).

5 Discussion

The field of antibody discovery and development is experiencing an acceleration thanks to the successes of DL in protein structure prediction and representation learning from protein sequences. Here we focused on the applicability of these methods and resources in an industrial setting, especially with respect to the possibility of integrating these methods into commercial products. In Table 3 we highlighted a selection of protein and antibody-specific models for sequence, folding, inverse folding and developability (humanization). Our choice is based on our assessment of usability and license considerations.

Table 3

Table 3. A selection of models with highlighted strengths and limitations. General protein models (top) and antibody-specific models (bottom).

To evaluate the performance of these models, a benchmark has been recently proposed, Fitness Landscape for Antibodies (FLAb), that covers six properties of therapeutic antibodies: expression, thermostability, immunogenicity, aggregation, polyreactivity and binding affinity (Chungyoun et al., 2024). The models considered in the study include decoder-only generative models trained with next token prediction [ProGen2 (Nijkamp et al., 2022), IgLM (Shuai et al., 2021) and ProtGPT2 (Ferruz et al., 2022)], encoder-only models for representation learning [AntiBERTy (Ruffolo et al., 2023)], inverse folding models [ProteinMPNN (Dauparas et al., 2023), ESM-IF (Hsu et al., 2022)] and a physics-based model [Rosetta (Koehler Leman et al., 2020)]. The authors showed that none of the models outperformed all the other models for all the tasks, underscoring the challenges in the development and application of these models for specific tasks.

Additional insights on how to improve these models will come from extending these benchmarks to different models, including PLMs like ESM-2 fine-tuned on antibody sequence data. With the increased availability of predicted antibody structures the current direction in the field is to integrate structural information in ALMs, with AntiBERTa2 (Barton et al., 2024) being an example. Antibody datasets, such as OAS, suffer from a germline content bias that can prevent the model from suggesting mutations that are further away from the germline sequence space. AbLang2 (Olsen et al., 2024b) is a recent model that addresses this bias, where the authors trained the model to focus more on non-germline residues using focal loss instead of cross-entropy loss to handle the class imbalance of germline versus non-germline residues in the training data.

The availability of models specifically developed for antibody structure prediction and inverse folding models allows to address the developability optimization problem both in sequence and structure space. This is particularly important as optimization in sequence space appears to be more constrained than in structure space (as mentioned in the developability cartography study (Habib et al., 2023)).

Humanization is the process that is best addressed by current developability methods, while other methods aimed at predicting solubility and viscosity suffers from the limited availability of experimental data for training and have been exploring less the application of language models.

These are exciting times for antibody discovery and development with AI that is being leveraged as a catalyst to accelerate and de-risk drug development on many fronts. We are starting to see how the process of bringing a drug from discovery to pre-clinical and clinical trials can be shortened and how the costs of this process can be reduced. The next few years will continue to see a fast-paced development and integration of these methods and resources in industrial applications, with the goal of ensuring that a newly found treatment can arrive faster to the patients.

Author contributions

LS: Writing–original draft, Writing–review and editing, Investigation. MB: Investigation, Writing–original draft. IX: Conceptualization, Writing–review and editing. BA: Conceptualization, Supervision, Writing–review and editing.

Funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

Authors LS, MB, IX, and BA were employed by JSR Life Sciences.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500. doi:10.1038/s41586-024-07487-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Adriaan Lategan, F., Schreiber, C., and Patterton, H. G. (2023). SeqPredNN: a neural network that generates protein sequences that fold into specified tertiary structures. BMC Bioinforma. 24 (1), 373. doi:10.1186/s12859-023-05498-4