Abstract
Antibodies make up an important and growing class of compounds used for the diagnosis or treatment of disease. While traditional antibody discovery utilized immunization of animals to generate lead compounds, technological innovations have made it possible to search for antibodies targeting a given antigen within the repertoires of B cells in humans. Here we group these innovations into four broad categories: cell sorting allows the collection of cells enriched in specificity to one or more antigens; BCR sequencing can be performed on bulk mRNA, genomic DNA or on paired (heavy-light) mRNA; BCR repertoire analysis generally involves clustering BCRs into specificity groups or more in-depth modeling of antibody-antigen interactions, such as antibody-specific epitope predictions; validation of antibody-antigen interactions requires expression of antibodies, followed by antigen binding assays or epitope mapping. Together with innovations in Deep learning these technologies will contribute to the future discovery of diagnostic and therapeutic antibodies directly from humans.
1 Introduction
Antibodies, which are the extracellular portion of B cell receptors (BCRs), play a critical role in adaptive immune responses. An antibody consists of two chains, heavy and light, each of which is composed of a constant and a variable region (Figure 1). The six complementarity determining regions (CDR) of the variable region are responsible for binding a specific antigen with high affinity (Pons et al., 2002; Davila et al., 2022). Antibodies are widely used for both disease diagnosis and treatment.
FIGURE 1

BCR structure. (A) Schematic representation of BCR structure. A BCR is composed of an immunoglobulin (antibody) molecule and a heterodimer (Igα/Igβ) that contain transmembrane and signal transduction regions. (B) The immunoglobulin variable region is composed of heavy (blue) and light (orange) chains (PDB entry: 7jmpHL). The six CDRs are represented by darker shades.
Traditional therapeutic antibody discovery approaches utilized animals, usually mice, to generate polyclonal antibodies against a target antigen. In this approach, candidate monoclonal antibodies (mAbs) are selected and engineered to minimize immunogenicity in humans, while maintaining target specificity and desired pharmacokinetics. The first blockbuster therapeutic antibody (anti-CD3 OKT3), which was engineered in this manner, was approved by the FDA in 1986. Animal-based antibody discovery had a huge impact on the pharmaceutical industry through the 1990’s and motivated the development of new antibody discovery platforms. By the mid-2000’s, approximately one-half of therapeutic antibodies were fully human through the use of transgenic mice or phage display platforms utilizing human BCR genes (Nelson et al., 2010; Ju et al., 2020).
In the past decade, a number of technological breakthroughs have enabled the discovery of antigen-specific mAbs directly from human donors (Pedrioli and Oxenius, 2021). Up to the mid-2000s, mining human B cell receptor (BCR) repertoires for mAbs specific to an antigen of interest was primarily done in academic research labs (Truck et al., 2015; Wang et al., 2015; Goldstein et al., 2019). However, the COVID-19 pandemic brought with it an urgent need for creative ways of targeting the SARS-CoV-2 virus quickly. Remarkably, within months of the pandemic, multiple research groups reported the discovery of neutralizing antibodies from the BCR repertoires of COVID-19 patients (Cao et al., 2020; Hansen et al., 2020; Ju et al., 2020; Pinto et al., 2020; Robbiani et al., 2020; Seydoux et al., 2020; Wang et al., 2020; Zost et al., 2020; Baum et al., 2021). Due to the overwhelming need for a response to the pandemic, along with the rapid availability of resources for COVID-19 related research, many of the mAbs were quickly tested for safety and efficacy in the clinic. The Antibody Society currently lists 35 anti-SARS-CoV-2 mAbs or mAb cocktails undergoing clinical trials (https://www.antibodysociety.org/covid-19-biologics-tracker).
Although it is important not to over-generalize the development of anti-SARS-CoV-2 antibodies to other disease areas, the intensity of research on COVID-19 has refocused attention on the technological innovations that enabled the discovery of antigen-specific antibodies from human BCR repertoires so quickly. Here we review four main areas of innovation: B Cell sorting, BCR sequencing, BCR repertoire analysis, and experimental validation of antigen binding. Although each of these areas are active research topics on their own, the greatest impact on the pharmaceutical industry will come through synthesis into integrated experimental and computational pipelines. Given the recent breakthroughs in computational biology, including antibody-specific machine-learning methods (Akbar et al., 2022), we can expect rapid growth in this area as data generation merges with data analysis in the context of antibody discovery.
2 B cell sorting
A repertoire of BCRs refers to a snapshot of all the B cells produced in a given donor at a given time. When studying repertoires, separating cells of interest by cell sorting is commonly used for isolating natural B cells with a specific phenotype or antigen specificity. This is one of the first steps in discovering antibodies from human donors. Common methods used for cell sorting include FACS (Fluorescence-activated cell sorting), MACS (Magnetic-activated cell sorting), or combinations of both. In FACS, fluorescently-labeled antigens are used as probes to isolate antigen-binding B cells, collect them into tubes or plates, and continue further processes such as bulk or single cell BCR gene amplification and sequencing (Gieselmann et al., 2021). Fluorescent-labeling of an antigen is a critical step and can be done via covalent chemical conjugation, expression of a recombinant antigen-fluorescent fusion protein, or by biotinylating the antigen and adding fluorochrome-conjugated streptavidin to make an antigen tetramer, which increases avidity to the antibody. It must be kept in mind that such labeling may occlude some part of the epitope and potentially disturb the B cell antigen recognition process (Boonyaratanakornkit and Taylor, 2019). Despite this potential complication, utilization of fluorescent-labeled antigens is a promising approach to the collection of antigen-specific B cells.
A relatively new technology, MACS, utilizes direct (primary antibody-conjugated microbeads) or indirect magnetic labeling (primary antibody plus a secondary antibody-conjugated microbead) of cells prior to separation through a magnetic field. Although it lacks sensitivity and is not compatible with multiple-marker profiles, the cell throughput, viability, and time requirements for MACS are comparable to FACS (Sutermaster and Darling, 2019). Some researchers combine these two methods to do enrichment of antigen-specific B cells (Galson et al., 2015a; Banach et al., 2022).
Isolating antigen-specific human B cells is nevertheless quite challenging. Memory B cells express large amounts of antigen receptors on their surfaces but are present in very low numbers in peripheral blood, the most accessible repertoire compartment of the human body (Waltari et al., 2019). The other most commonly-studied subset of antigen-specific B cells consists of antibody-secreting cells (ASCs). ASCs can be found in higher numbers in peripheral blood, especially after vaccination or infection; however, ASCs, especially those of the IgG isotype, are thought to have limited immunoglobulin surface expression. This might be a reason why many previous studies of ASCs did not utilize antigen-based sorting, but rather collected all ASCs and screened for antigen-specificity downstream after culturing the cells in vitro and stimulating antibody secretion before sequencing (Lavinder et al., 2014; Galson et al., 2015a; Acquaye-Seedah et al., 2018; Pedrioli and Oxenius, 2021). However, IgA and IgM isotype ASCs retain expression of surface immunoglobulin (Pinto et al., 2013; Blanc et al., 2016), making it relatively straightforward to sort these subsets in an antigen-specific manner.
Another challenge in antigen-based cell sorting is related to the specificity that is, whether or not the selected cells are truly positive binders or just appear through nonspecific binding to the fluorochrome, streptavidin, or any added linkers (Doucett et al., 2005; Boonyaratanakornkit and Taylor, 2019). During the sorting process, it is necessary to reduce these background signals as much as possible. Due to the limitation of sample quantity and a low number of antigen-specific cells in the sample, we often lack ideal positive control cells from which a positive threshold for fluorescence (gate) can be used to define antigen-binding cells. Thus, in general, one must rely on a negative control population (which can be cells or decoys that are stained by unlabeled antigen or fluorochrome without the antigen) to set the gate for antigen binders (Figure 2A). Additionally, to increase specificity, a double fluorescent staining strategy can be used to label the antigen probe. Here, an antigen is labeled with two different fluorescent labels and the double positive cells are deemed positive (Amanna and Slifka, 2006). However, this approach still leaves some opportunity for non-binders to be recruited, as seen in a previous report that only 80% of sorted cells were positively bound to the antigen after being recombinantly produced and tested by ELISA (Attaf et al., 2020). This observation suggests that experimental validation (discussed in Section 5) is a requirement for any antibody discovery workflow based on antigen-based B cell sorting. This can be a potential bottleneck, since the production of recombinant antibodies following the acquisition of antibody sequences by conventional cloning and expression in mammalian cells can be labor intensive (Pedrioli and Oxenius, 2021).
FIGURE 2

Antigen-specific B cell sorting with or without LIBRA-seq. (A) Utilization of double negative control population to determine gating line for selecting antigen-binding cells. The population inside the red box is considered “antigen-binding.” (B) Workflow of antigen-specific cell sorting with and without the utilization of LIBRA-seq. Samples can be obtained from vaccinated donors or patients with a certain disease. LIBRA-seq uses barcoded antigen along with the fluorescent label, that can be read by the NGS machine.
To overcome these challenges, and to increase the throughput of antigen-based sorting, the Linking B-cell Receptor to Antigen Specificity through Sequencing (LIBRA-seq) method was introduced in recent years (Setliff et al., 2019). LIBRA-seq is a modification of antigen-based cell sorting that makes use of next-generation sequencing (NGS) technology. In LIBRA-seq, in addition to the fluorescent label, the antigen probe is coupled to a unique DNA barcode that is readable in the sequencing stage. The B cells are thus enriched for antigen-binding cells by FACS; then, the specific antigen is mapped to the B cell by the expression level of the barcode (Figure 2B). This allows simultaneous capture of several antigen probes, tagged by the same fluorescent color but different barcodes. Each cell will have scores for each antigen in the screening library. These scores are a function of the unique molecular identifiers (UMIs) for the respective antigen barcodes (Setliff et al., 2019). Several studies utilized this method to efficiently discover SARS-CoV-2 specific antibodies (He et al., 2021; Kramer et al., 2021; Shiakolas et al., 2021; Kramer et al., 2022; Shiakolas et al., 2022; Suryadevara et al., 2022). The latest version of LIBRA-seq allows epitope mapping by barcoding several variants of the antigen, each with known epitopes mutated (Walker et al., 2022).
3 B cell receptor sequencing
Due to the unique phenomenon of gene rearrangement in the generation of BCR coding sequences, BCR diversity at the amino acid sequence level is believed to be in the range of 1016–1018 (Briney et al., 2019). With the development of NGS, High-throughput sequencing-based (HTS) sequencing has been used to analyze both T cell receptor (TCR) and BCR repertoires (Yaari and Kleinstein, 2015). The first use of HTS technology for immune repertoire analysis was made by Campbell (Campbell et al., 2008) in 2008 using the Roche454 platform to explore IGH hypermutation variants carried in patients with chronic B-lymphocytic leukemia at the DNA level. Since this time, a number of new technologies have emerged. These can be divided roughly into two groups: bulk and single-cell sequencing. In bulk sequencing the pairing between heavy and light chains is lost; in single-cell sequencing, this pairing is maintained.
3.1 Bulk B cell receptor sequencing
Bulk sequencing provides in-depth information on the frequency of single chains, which gives a high-resolution view of diversity (a measure of the range and distribution of certain features within a given population (Xu et al., 2020)) and clonal expansion (the proliferation of lymphocytes activated by clonal selection in order to produce a clone of identical cells (Polonsky et al., 2016)), as entire cell populations can be sequenced in a single pipeline (Kovaltsuk et al., 2017). Two starting materials can be used as initial templates for repertoire sequence: genomic DNA (gDNA) and messenger RNA (mRNA). gDNA has the advantage of stability and a constant initial gene copy number between cells (Chaudhary and Wesemann, 2018). mRNA as a template requires reverse transcription, during which UMIs can be added, a step that helps in identifying duplicate or/and cloned sequences generated by PCR, which circumvents PCR bias or sequencing errors (Turchaninova et al., 2016; Rosati et al., 2017). In addition, synthetic repertoires utilize long (∼500 bp) oligonucleotide synthesis and high-throughput sequencing to generate a template for every possible V/J combination for minimization of PCR amplification bias, and additional computational normalization to remove residual bias. (Carlson et al., 2013). Multiplex-PCR (m-PCR) and 5′RACE approaches are the two main methods used for amplification. m-PCR has the advantage that only one-step PCR is required, regardless of whether adaptors are included in the primers or not; when the material selected is gDNA, the downstream primers are restricted to several J gene segments, due to the existence of introns (Bashford-Rogers et al., 2014). 5′RACE requires only one set of oligonucleotides, and designing primers from the C gene increases specificity and greatly reduces PCR bias (Yeku and Frohman, 2011), but has a relatively complex workflow for the library building. In addition to BCR information, it is often desirable to obtain the phenotypes or specific subsets of the B cells. Information on immune receptor libraries can be extracted from RNA-seq data, as BCRs are part of bulk RNA-seq data. However, the sensitivity of such an approach is low because of the under-expression of genes at the transcriptional level and also because large-scale RNA-seq usually results in a mixture of cellular gene expression profiles in the sample. Therefore, RNA-seq usually requires pre-targeted protein labeling of cells with fluorescently labeled antibodies to purify the cell types in the sample (Picot et al., 2012).
3.2 Single-cell B cell receptor sequencing
To obtain paired heavy-light chain sequences, single B cell resolution is required, since the mRNAs of each chain are physically separate. When coupled with single-cell RNA-seq, BCR sequencing can also provide important phenotype information on the cells (Haque et al., 2017). Goldstein and co-workers showed that single B cell sequencing can recover a higher number of antibody lineages compared to hybridoma technology (Goldstein et al., 2019). Single-cell sequencing has also been used to identify SARS-CoV-2 specific Abs (Woodruff et al., 2020; He et al., 2021), tumor-specific Abs (Buus et al., 2021), and autoimmune disease-specific Abs (Sulen et al., 2020; Jin et al., 2021). Single-cell sequencing is now readily available from several companies, including 10x Genomics and Takara Bio.
Single-cell sequencing combines multiple levels of information, not limited to intracellular gene expression and BCR pairing information; by adding specific oligonucleotide barcode-associated antibodies, thereby allowing surface proteins to be characterized, similar to flow cytometry. Single cells can be isolated in microtiter plates or droplets and then physically linked by overlapping extension RT-PCR in the variable regions of heavy and light chains. Although the potential to obtain BCR pairing information at high throughput has been demonstrated, this technique requires custom equipment and does not yield full-length variable region sequence information (Goldstein et al., 2019). Although full-length sequences can be inferred by the assembly, there is uncertainty in this process (DeKosky et al., 2013; DeKosky et al., 2015; McDaniel et al., 2016). RAGE-seq (repertoire and gene expression by sequencing) combines the genomic technologies of Oxford Nanopore Technologies’ long reads, 10x Genomics, Illumina’s short reads, and CaptureSeq4 major platforms to enrich RNA from single B cell, and then assembles full-length sequences computationally (Singh et al., 2019).
The main limitation to current single-cell sequencing is the tradeoff between sequencing depth and cost. Sequencing depth is the number of transcripts detected from each cell which should be controlled together with the number of cells to get enough coverage (average number of reads that align to specific locus in a reference genome to “cover” reference bases) for confident sequence assignment. The minimum sequencing depth for single cell VDJ analysis is around 5,000 paired reads per cell, while gene expression analysis requires a minimum 20,000 reads per cell, which can be increased depending on needs to analyze a greater number of genes. In comparison with bulk sequencing, which usually obtains 1 million reads per sample, singe-cell sequencing depth depends on the desired number of cells in one sample. For example, we recently obtained approximately 20 million reads from one thousand cells with 20,000 reads per cell (unpublished results). The cost of single-cell sequencing mainly comes from the library preparation step which can be 10–20 times higher than for traditional bulk sequencing. Current developments are focused on how to reduce the cost and increase the sequencing depth of single-cell sequencing (Haque et al., 2017; Upadhyay et al., 2018; Wu et al., 2018).
3.3 Annotation of raw sequence data
Annotation includes defining the V, D, and J genes for a given BCR, inferring the accurate amino acid sequence, and assigning the CDR boundaries. These are nontrivial tasks. Several numbering schemes to define CDRs have been proposed including Kabat, Chothia, Martin, Gelfand, IMGT, and AHo (Dondelinger et al., 2018). Meanwhile, several tools have been developed to streamline the process of annotation to use these numbering schemes. For the assignment of CDRs ANARCI is a reliable and user-friendly tool (Dunbar and Deane, 2016). For gene and amino acid assignment, the strengths of the various tools have been systematically discussed in several previous publications (Heather et al., 2018; Lopez-Santibanez-Jacome et al., 2019; Smakaj et al., 2020). Here, we will describe several of tools for the analysis of bulk- and single-cell sequence data. IMGT is one of the most widely used annotation platforms today. High-quality germline sequence information for most species is assembled in IMGT, and therefore reference libraries for the vast majority of sequence annotation tools are derived from IMGT (Lefranc et al., 2005; Manso et al., 2022). In 2011, IMGT developed a platform for HTS T/B repertoire data, supporting raw sequence uploads in FASTA and FASTQ formats (Alamyar et al., 2012; Li et al., 2013). IgBLAST was originally developed as a tool for analyzing immunoglobulin sequences using BLAST, a local alignment method (Ye et al., 2013). Although data can be uploaded directly through the webpage, it does not show many advantages in the analysis of HTS data or presentation of results. MiXCR is another widely used stand-alone package for BCR repertoire annotation, as there is no restriction on the number of sequences. It uses an improved k-mer chaining algorithm for sequence alignment, and an error correction procedure can be performed based on the quality of the sequences (Bolotin et al., 2015).
For scRNA-seq data generated through the 10x Genomics platform, pre-processing similar to bulk sequencing is required before downstream analysis. The company offers Cell Ranger (Zheng et al., 2017), which is recommended for its ability to process both gene expression and paired TCR/BCR data. The development of tools to reconstruct immune repertoire information from single-cell or bulk RNA-seq is an active area, including tools such as BASIC, BRACER, and BALDR (Haque et al., 2017; Upadhyay et al., 2018; Wu et al., 2018).
4 B cell receptor repertoire analysis
BCR repertoire sequence data is growing rapidly. In this section, we will first describe methods to analyze diversity, clonal composition, or the specificity of BCRs from different cohorts. Next, we introduce databases and platforms to store repertoire data. Lastly, we will briefly discuss antibody structural modeling and epitope/paratope prediction.
4.1 B cell receptor repertoire sequence analysis
Sequence analysis of BCR repertoire data is a rapidly evolving field that can be roughly divided into three main components: diversity, clonal composition (the relative abundance of specific clones (ImmunoMind, 2019)), and antigen/disease binding specificity. There are a large number of tools and packages that can be used to analyze BCR diversity, clonal frequency, and networks of BCRs. Some well-used tools like Immcantation (Vander Heiden et al., 2014; Gupta et al., 2015), and Immunarch (ImmunoMind, 2019) allows visualization of results after direct import of data from Cell Ranger or MiXCR. Some representative visualizations of results are shown in Figure 3. Many metrics in repertoire analysis are general methods used beyond BCRs or TCRs, such as Shannon and Simpson diversity (Leinster and Cobbold, 2012; Greiff et al., 2015a). Shannon diversity correlates with increasing sequence uniformity, whereas Simpson diversity assigns greater weight to dominant sequences. Clonal abundance is another measurement to quantify sequence distribution that can be used to determine the difference in the ratio of high-frequency to low-frequency sequences between healthy and diseased patients (Yaari and Kleinstein, 2015). High expression or low expression of V, D, and J BCR genes can indicate immune responses (Lee et al., 2021; Kotagiri et al., 2022). Meanwhile, the length and amino acid usage of CDR3 regions is often used to characterize repertoires in terms of a few dependent parameters. For example, one study reported that the average CDRH3 lengths of IGHG1, IGHA1, and IGHA2 were significantly greater in COVID-19 patients than healthy cohorts (Galson et al., 2020). Moreover, the same authors identified CDRH3 amino acid sequence signatures within COVID-19 patients with different symptoms (Galson et al., 2020). Also, diversity caused by high-frequency mutations in somatic cells is another important feature of BCR sequences. In general, mutation analysis shows the extent of differentiation compared to germline sequences and indicates antigen-driven affinity maturation. One group utilized the frequency of somatic hypermutation (SHM) of the heavy chain as a feature to identify HIV patients possessing broadly neutralizing antibodies (Roskin et al., 2020).
FIGURE 3

BCR repertoire sequence analysis. (A) Sample collection using PBMCs from blood, followed by NGS. (B) The Shannon, Simpson, D50, Gini, and chao1 indices are designed to assess the overall diversity of each cohort. (C) Clone proportion can show the change of high-frequency (clone expansion) as well as low-frequency sequences in each sample. (D) The bias of V and J gene usage and their combination can reflect immune responses of different repertoires. (E) The length distribution of the CDR3 region can be used to characterize repertoire. (F) Venn plot can be utilized for visualizing the degree of convergence among samples and exploring the potentially disease-specific public clone. (G) Networks of BCRs.
Identification patterns relating to specific antigens or disease cohorts is a major challenge in BCR repertoire analysis. Previous studies have observed BCR sharing among HIV patients, HBV vaccination donors, Influenza vaccination donors, and COVID-19 patients (Jackson et al., 2014; Galson et al., 2015a; Setliff et al., 2018; Kim et al., 2021; Voss et al., 2021). In order to quantify the convergence (the existence of similar or identical BCR sequences among donors in a common cohort) of BCRs, clonotypes (same V, J gene and identical amino acids in CDR3 region) analysis is widely used in both bulk and single-cell analyses (Soto et al., 2019; Raybould et al., 2021b). Furthermore, clustering such clonotypes (e.g. with 80% or greater sequence identity in their CDR3) is often used (Galson et al., 2020; Nielsen et al., 2020). Similar CDR3 sequences that dominate the immune response in different individuals following antigen stimulation are often referred to as a “convergent” or “public” (Truck et al., 2015). Experimental validation of clustering will be described in detail in section 5.2.
4.2 Repertoire databases and data mining
Due to continuous advances in sequencing technology, BCR repertoire sequence data, especially bulk data, has grown rapidly in recent years. Most data associated with published repertoire research is stored in public databases such as the Sequence Read Archive (SRA) or European Nucleotide Archive (ENA), in the form of raw NGS reads. Since SRA and ENA do not allow sequence-level searches, such analysis must be performed on specialized repertoire web servers or by using command-line tools. In this section, we describe a number of databases for antibody sequences, structures or both, that can help in the mining of antigen-specific BCR sequences.
• Observed Antibody Space (OAS) is a comprehensive and frequently updated website and database (Kovaltsuk et al., 2018; Olsen et al., 2022) Although OAS also contains paired data, to our knowledge, it is the first organized collection of bulk BCR sequences. Metadata such as study, species, disease, vaccine, B cell source, and subset can be searched (Figure 4).
• The iReceptor (Corrie et al., 2018) platform allows sharing and comparing adaptive immune receptor repertoire (AIRR)-seq data. It has two key components: a data repository that focuses on AIRR data, and a web-based Scientific Gateway that allows researchers to discover, federate, explore, and analyze AIRR-seq data (Figure 4).
• SAbDab (Dunbar et al., 2014) is a frequently updated resource containing all publicly available antibody structures and, similar to OAS, is convenient to search using metadata, including species, experimental method, resolution, or amino acid at a given position using canonical numbering.
• IMGT/3Dstructure-DB (Ehrenmann et al., 2010) is a three-dimensional structure database of IMGT entries that stores the structures of immunoglobulins, TCRs, and major histocompatibility complex proteins of humans and other vertebrate species. A related database, IMGT/2Dstructure-DB, stores the amino acid sequences from INN/WHO and Kabat databases (Ehrenmann and Lefranc, 2011). IMGT/3Dstructure-DB contains 8,437 entries as of 3 June 2022.
• huARdb (Wu et al., 2022) is a versatile and user-friendly web interface consisting of data from 444,794 high confidence T or B cells with full-length TCR/BCR sequences and transcriptomes from 215 datasets, which have been subjected to a uniform workflow.
• PIRD (Zhang et al., 2020) is a multi-species BCR dataset that contains 5 main information modules, including project information, sample information, raw sequencing data, annotated TCR or BCR repertoires, and a database of TCRs and BCRs targeting known antigens (TBAbd). PIRD can also carry out analyses, including biased gene usage, the length distribution of CDR3, and the diversity index for each dataset directly.
• ImmPort (Bhattacharya et al., 2018) is one of the largest repositories of open immunology data. It hosts data from more than 300 clinical and mechanistic studies in humans and immunological studies on model organisms, categorized as Private Data, Shared Data, Data Analysis, and Resources, with a focus on allergy, autoimmune disease, infection response, transplantation, and vaccine response.
• cAb-Rep (Guo et al., 2019) contains 306 immunoglobulin repertoires from a database consisting of 121 healthy, vaccinated, or autoimmune disease donors. The database contains 267.9 million IGH and 72.9 IGL full- or nearly full-length transcripts that have been annotated according to isotype, somatic hypermutation (SHM), and other biological characteristics.
• ImmuneDB (Rosenfeld et al., 2017; Monasterio et al., 2018) is a high-throughput immune receptor sequencing data system that integrates data storage and analysis. The developers demonstrated that ImmuneDB and MiXCR have comparable performance in annotating raw data. Output includes selection pressure, lineage mapping, novel allele detection, etc. ImmuneDB states that their method can quickly identify more potential sequences compared with IMGT/High-Vquest and that IT performs similar to MiXCR on the same input data.
• VDJServer (Christley et al., 2018) integrates large data storage and analysis. The advantage of VDJServer is that sequence annotations can be performed after quality control screening of raw data.
• Our lab has recently launched InterClone (Wilamowski et al., 2022), a resource that contains both BCR and TCR repertoire data, along with tools to store, search or cluster the data. Distinguishing features of InterClone include: the ability of users to control the visibility of their data; efficient encoding of CDR regions to allow flexible searches or clustering using user-specified similarity thresholds for CDRs; a large amount of BCR data, particularly for COVID-19, influenza, HIV, and healthy donors.
• There are also databases created for specific diseases. CoV-AbDab (Raybould et al., 2021a) currently contains 10,005 antibodies and nanobodies from published papers/patents that bind to at least one betacoronaviruses (last updated: 26th July 2022). This database is the first known integration of antibodies that bind SARS-Cov2 and other betacoronaviruses, including SARS-CoV1 and MERS-CoV. It contains evidence of cross-neutralization, the origin of antibody nanobodies, full-length variable structural domain sequences, germline assignments, epitope regions, PDB codes (if relevant), homology models, and literature references.
• CATNAP (Yoon et al., 2015) is a web server for HIV data, including antibody sequences from the authors’ own and published studies. As input, users can select specific antibodies or viruses, a panel from published studies, or search using local data. The output overlays neutralization panel data, viral epidemiology data, and viral protein sequence comparison on a single page with further information and analysis. Users can highlight alignment positions, or select antibody contact residues and view position-specific information from the HIV database.
FIGURE 4

Metadata available in OAS and iReceptor. The bar chart shows the cumulative growth of BCR sequence data. OAS provides about 1,650 million and iReceptor provides about 955 million BCR sequences (data updated on 1 September 2022). The donut graph shows the ratio of chain type in each database. Most BCR sequences are heavy chains in both OAS (93%) and iReceptor (90%). Only 0.00011% sequences in OAS are paired BCRs, which are not visible in the donut graph.
4.3 Antibody structure prediction
Protein structure prediction is one of the areas of computational biology that has progressed most rapidly in recent years, owing to breakthroughs in Deep learning (Baek et al., 2021; Jumper et al., 2021). Before these breakthroughs, a plethora of antibody modeling tools existed that performed similarly well for all regions except CDRH3 loops (Almagro et al., 2014). However, it is likely that going forward, all state-of-the-art methods for antibody modeling will utilize some aspects of current Deep learning-based protein structural modeling methods. An important first step in this direction is DeepAb, which convincingly out-performed traditional template-based methods, including our own Repertoire Builder, in terms of antibody structural accuracy (Ruffolo et al., 2022). In a recent assessment, we found that the average CDRH3 root-mean-square deviation (RMSD) dropped from 4.38 to 3.44 Å for AlphaFold compared with our own, previously state-of-the-art, Repertoire Builder in a large and diverse set of 620 antibodies (Xu et al., 2022). Therefore, for antibodies without bound antigens, the current Deep learning approach appears to be a significant improvement.
Unfortunately, it is becoming accepted that the multi-chain extension of AlphaFold, AlphaFold multimer, does not work well for antibody-antigen complexes (Evans et al., 2022). This problem probably arises in part from the fact that AlphaFold uses overall sequence similarity to construct multiple sequence alignments, which are, in turn, used as feature vectors. Many antibodies that target different antigens will be aligned in this process, resulting in a noisy signal. Indeed, when we assessed the complex modeling performance of AlphaFold multimer using a small benchmark of 25 antibody-antigen complexes, we found that the vast majority were docked to the wrong epitope (Standley et al., 2022). Therefore, it will be interesting to see if a more careful selection of sequences and structural templates within the Deep learning workflow will lead to more coherent antibody-antigen complex modeling. CDR-based clustering is one of the functions repertoire databases (see Section 4.2) can provide. Another interesting direction is to couple antibody-antigen complex modeling with epitope prediction.
4.4 Epitope prediction
As of 19 July 2022, there were only 9,811 recorded antibody-antigen structures available in the Protein Data Bank (Berman et al., 2008; Raybould et al., 2020). Due to the time-consuming and labor-intensive process of experimental methods to investigate antibody-antigen interactions experimentally (see Section 5), there is a need for computational approaches that can quickly predict the epitope and paratope from sequence or structure information. Compared to the difficulty of epitope prediction (where almost any surface patch of antigen could be an epitope for some antibody), the paratope prediction problem is relatively easy. Most paratopes are located within the six CDRs in the variable fragment of heavy and light chains. Many published tools like Parapred (Liberis et al., 2018) and proABC-2 (Ambrosetti et al., 2020) can achieve satisfactory performance in paratope prediction. Thus, in this section, we will focus on epitope prediction.
In recent decades, many tools have been developed in order to predict continuous/linear B-cell epitopes using antigen sequence information or discontinuous/conformational B-cell epitope using antigen structure information. These methods generally adopt machine learning approaches (support vector machines, random forests, linear regression, and neural networks) to learn epitope features from known complex structures (Table 1). One problem with many epitope prediction tools is that they only use features of the antigen, whereas we are generally interested in antibody-specific epitopes (Sela-Culang et al., 2015). The direction to solve this problem is to introduce antibody features into the process of epitope prediction. Some tools, including PECAN (Pittala and Bailey-Kellogg, 2020) and Pinet (Dai and Bailey-Kellogg, 2021) used Deep learning to extract antibody and antigen features for use in epitope prediction. Other tools, like EpiPred (Krawczyk et al., 2014), MAbTope (Bourquard et al., 2018), and AbAdapt (Davila et al., 2022), incorporate antibody-antigen docking-based features; these studies have demonstrated that the inclusion of the antibody features improves epitope prediction. As expected, antibody-antigen docking is sensitive to antibody model quality (Davila et al., 2022). We recently incorporated the more accurate antibody models produced by AlphaFold (Jumper et al., 2021) into the AbAdapt pipeline (Xu et al., 2022). We observed significant improvement in docking, paratope prediction, and antibody-specific epitope prediction compared with the default AbAdapt pipeline. In a realistic case, using an anti-SARS-CoV-2 RBD antibody complex benchmark, the use of AlphaFold resulted in higher epitope prediction accuracy than all other tested tools.
TABLE 1
| Catalog | Names | Availability | Method | References |
|---|---|---|---|---|
| linear B-cell epitope | ABCPred | https://webs.iiitd.edu.in/raghava/abcpred/index.html | Recurrent neural network | Saha and Raghava (2006) |
| linear B-cell epitope | AAPred | https://www.bioinf.ru/aappred/ | Support vector machine | Davydov and Tonevitsky (2009) |
| linear B-cell epitope | FBCPred/BCPREDS | https://ailab.cs.iastate.edu/bcpreds/ | Two machine learning approaches | El-Manzalawy et al. (2008) |
| linear B-cell epitope | COBEpro | https://scratch.proteomics.ics.uci.edu | Support vector machine | Sweredoski and Baldi (2009) |
| linear B-cell epitope | BepiPred-2.0 | https://services.healthtech.dtu.dk/service.php?BepiPred-2.0 | Random forest | Jespersen et al. (2017) |
| linear B-cell epitope | Lbtope | http://crdd.osdd.net/raghava/lbtope/ | Support vector machine and k-nearest neighbor | Singh et al. (2013) |
| linear B-cell epitope | DRREP | https://github.com/CorticalComputer/DRREP | Deep neural network | Sher et al. (2017) |
| linear B-cell epitope | SVMTriP | http://sysbio.unl.edu/SVMTriP | Support vector machine | Yao et al. (2012) |
| linear B-cell epitope | LBEEP | https://github.com/brsaran/LBEEP | Support vector machine and AdaBoost-random forest | Saravanan and Gautham (2015) |
| linear B-cell epitope | EPMLR | http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ | Multiple linear regression | Lian et al. (2014) |
| linear B-cell epitope | iBCE-EL | http://thegleelab.org/iBCE-EL | Randomized tree and gradient boosting classifiers | Manavalan et al. (2018) |
| linear B-cell epitope | iLBE | http://kurata14.bio.kyutech.ac.jp/iLBE/ | Random forest | Hasan et al. (2020) |
| linear B-cell epitope | EpiDope | http://github.com/mcollatz/EpiDope | Deep neural network | Collatz et al. (2021) |
| Conformational B-Cell epitope | EliPro | http://tools.iedb.org/ellipro/ | Clustering of neighboring residues based on protrusion index | Ponomarenko et al. (2008) |
| Conformational B-Cell epitope | PEPITO | http://pepito.proteomics.ics.uci.edu/ | Linear combination | Sweredoski and Baldi (2008) |
| Conformational B-Cell epitope | CBTOPE | http://www.imtech.res.in/raghava/cbtope/ | Support vector machine | Ansari and Raghava (2010) |
| Conformational B-Cell epitope | DiscoTope 2.0 | https://services.healthtech.dtu.dk/service.php?DiscoTope-2.0 | Epitope propensity scores | Kringelum et al. (2012) |
| Conformational B-Cell epitope | SEPPA 3.0 | http://www.badd-cao.net/seppa3/index.html | Logistic regression and clustering coefficient | Zhou et al. (2019) |
| Conformational B-Cell epitope | CluSMOTE | https://github.com/BSolihah/conformational-epitope-predictor | Support vector machine and decision tree | Solihah et al. (2020) |
| Combining antibody feature | EpiPred | http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/epipred/ | Combing the conformational matching of structures and a specific score | Krawczyk et al. (2014) |
| Combining antibody feature | MAbTope | Lead corresponding contact | Integration of docking-based prediction method and experimental steps | Bourquard et al. (2018) |
| Combining antibody feature | PECAN | https://github.com/vamships/PECAN | Paratope and epitope prediction with graph convolution attention network | Pittala and Bailey-Kellogg (2020) |
| Combining antibody feature | Pinet | https://github.com/FTD007/Pinet | Geometric deep neural network | Dai and Bailey-Kellogg (2021) |
| Combining antibody feature | AbAdapt | https://sysimm.org/abadapt/ | Combining docking-based features to predict antibody-specific epitope | Davila et al. (2022) |
Summary of the epitope prediction tool.
It is also worth noting that the combination of different deep or machine learning models is becoming a general trend. A Deep learning framework was developed to extract local features around target residues and global features of the full antigen sequence using Graph Convolutional Networks (GCNs) and Attention-Based Bidirectional Long Short-Term Memory (Att-BLSTM) networks separately (Lu et al., 2022). The local and global features from two networks were combined to predict the epitope and demonstrate that global features play a critical role in structure-based epitope prediction (Lu et al., 2022). Moreover, recent work introduced general protein language models that not only focus on the reported antigen-antibody complex to capture binding patterns, but also used the deep transformer based protein language model, ESM-1b (Rives et al., 2021), to achieve more accurate epitope prediction only using the antigen sequence information in BepiPred-3.0 (Clifford et al., 2022). Recently, Robert and co-workers used simulated antibody-antigen data in order to circumvent the lack of experimentally-determined antibody-antigen structure complexes and focused on the challenging problem of learning antigen and epitope specificity features from antibody sequences (Robert et al., 2022). The potential advantage of these later methods is that they circumvent the time-consuming docking step.
5 Experimental validation
5.1 Epitope discovery
A prerequisite to therapeutic antibody discovery is to identify the epitope for a given antibody-antigen pair. There are several well-established experimental approaches to elucidate the epitope information including X-ray crystallography, nuclear magnetic resonance (NMR), peptide-based microarrays, mutagenesis, and cryo-electron microscopy (Cryo-EM). X-ray crystallography is the gold standard to determine the precise binding between antibody and antigen (Holcomb et al., 2017). However, X-ray crystallography has disadvantages in terms of throughput and cost; moreover, flexible or membrane-bound antigens are notoriously difficult to crystallize (Abbott et al., 2014). Nuclear magnetic resonance can also be utilized to obtain detailed epitope mapping information (Blech et al., 2013). But this method has relatively low sensitivity, requires high purity and solubility, and small size for the proteins (Wüthrich, 1990; Pan et al., 2016). Although peptide-based microarrays are high-throughput, and can sometimes identify epitopes with high sensitivity, peptide-based microarray performance is limited by various factors: affinity of the peptides, immobilization methods, and conformational constraints induced by the immobilization. Furthermore, the limitation of linear epitopes is a major concern (Qi et al., 2019). Mutagenesis allows the investigation of epitopes without the need for structure determination. For example, using the alanine shotgun approach, epitopes for difficult proteins such as membrane proteins can be quickly identified. One disadvantage of this approach is that it is hard to clarify whether the mutation has disrupted the folding (Peng et al., 2011).
Due to the combined requirement to quickly and precisely identify epitopes, many studies combine traditional epitope discovery strategies with cutting-edge technologies. Antibody binding epitope Mapping (AbMap), a combination of phage-displayed peptide libraries with next-generation sequencing, was developed to determine 200 antibody-specific epitopes in a single run (Qi et al., 2021). Additionally, microarrays consisting of 648 overlapping peptides that cover the four major structural proteins of the SARS-CoV-2 virus have been constructed: Spike, Nucleocapsid, Membrane, and Envelope (Hotop et al., 2022). This microarray fingerprint of positive serum samples was learned by a machine learning model and the epitopes were used to diagnose COVID-19 positive and negative donors (Hotop et al., 2022).
Among the wide range of experimental epitope mapping methods, we will focus on two promising approaches: Hydrogen-Deuterium Exchange Mass Spectrometry (HDX-MS) and Deep Mutational Scanning (DMS), which have the potential to perform medium-throughput epitope discovery.
HDX-MS measures changes in the mass of a protein by isotope exchange between amide hydrogens of the protein backbone and its surrounding solvent. The folded state of the protein and its dynamics will affect the rate of this exchange (Masson et al., 2019). In recent years, HDX-MS has been increasingly used for epitope and paratope mapping of antibody-antigen complexes due to its speed and small sample size requirements, and insensitivity to protein size. A semi-automated HDX-MS workflow was used to perform epitope mapping of Fab-CR6261 with diverse influenza Hemagglutinin subtypes (Puchades et al., 2019). Similarly, an in-house HDX-MS system was constructed to explore the binding of birch Bet v1 protein, a native pollen allergen, in the presence of four antibodies that target non-redundant epitopes (Zhang et al., 2018). Two uncontentious epitope loops of TL1A with anti-TL1A monoclonal antibody 1 were identified by HDX-MS (Huang et al., 2018).
The routine workflow of HDX-MS epitope mapping is performed by using antigen alone as a reference and in the presence of antibodies. The antigen and antibody are labeled in D2O buffer under equilibrium conditions at several time points. Compared with antigen alone, the contact of antibody lowers the solvent exposure of antigen residues in the epitope region and leads to the reduction of deuterium incorporation. After protease digestion, proteolytic peptides are desalted and separated on a mass spectrum (MS) system. The fingerprint of antigen alone and antigen-antibody complex will be captured and analyzed by downstream bioinformatics analysis (Figure 5A) (Tran et al., 2022). However, HDX-MS also has some limitations. The major limitation is that HDX-MS can only capture peptide-level information and the individual residue contribution among peptides remains uncertain. Another limitation is the insufficient sequence coverage of peptides spanning the whole protein sequence (Masson et al., 2019). HDX-MS also can’t capture the information of prolines which do not have an amide hydrogen group for deuterium exchange (Huang et al., 2018). Recent studies have incorporated peptide-level information from HDX-MS with antibody-antigen docking to overcome the drawbacks of either method alone (Bennett et al., 2019; Fields et al., 2021).
FIGURE 5

Workflow of Hydrogen-Deuterium Exchange Mass Spectrometry (HDX-MS) and Deep mutational scanning (DMS) for epitope mapping. (A) The HDX-MS workflow consists of high-quality protein sample preparation; antigen HDX experiment with or without the presence antibody; antigen peptides are processed by MS; levels of deuteration are quantified by intensity-weighted centroid m/z value; epitope mapping. (B) The DMS workflow consists of library construction of antigen mutants; expression; coincubation with antibody; cell sorting; sequencing; visualization of results as a heatmap.
DMS makes use of massive (typically 1 million) mutant versions of a protein in a single experiment to reveal their intrinsic properties by analyzing large-scale phenotype readouts (Fowler and Fields, 2014). By incorporating NGS, the DMS method can observe the effect of individual mutants in a large population. A typical DMS workflow for epitope mapping includes library construction and mutation design of the antigen; library expression and incubation with antibody; sorting cells of interest by FACS or measuring the binding affinity; sequencing the selected mutations and constructing the data of deep mutation heatmap through bioinformatics analysis. (Figure 5B).
DMS has been used to investigate various disease-related antigens. In one study, DMS was utilized to precisely map the epitopes of a panel of cross-neutralizing nanobodies against H1N1 and H5N1 (Gaiotto and Hufton, 2016). In another study, functional constraints and comprehensive mutations of the Zika virus envelope (E) protein were constructed and the effects of viral growth as well as viral neutralization by two monoclonal antibodies were measured (Sourisseau et al., 2019). Additionally, a platform that combines immunoprecipitation of phage peptide libraries and DMS (Phage-DMS) was constructed. Through Phage-DMS, the authors designed all possible amino acid variants of the HIV Envelope and performed fine mapping of epitopes using four well-characterized HIV antibodies (Garrett et al., 2020). In a recent report, all mutations to the SARS-CoV-2 RBD were first measured by DMS and the effect of expression and affinity for ACE2 were also evaluated (Starr et al., 2020). Meanwhile, DMS was also used to systematically mutate Wuhan-Hu-1, Alpha, Beta, Delta, and Eta variant RBDs and identified some substitutions that cause epistatic shifts during viral evolution (Starr et al., 2022). Also, many studies have reported the utilization of DMS to investigate hotspots of SARS-CoV-2 RBD that enable escape from neutralizing antibodies (Greaney et al., 2021a; Greaney et al., 2021b; Starr et al., 2021; Tsai et al., 2021). These applications convincingly demonstrate that DMS can facilitate the understanding of antigen function and systematically evaluate antibody escape.
In addition to epitope analysis, DMS can be applied to antibodies themselves to identify paratopes or for optimizing other phenotypes. In one case, DMS was used to identify many affinity-enhancing mutations at the variable light-heavy chain interface of an anti-lysozyme antibody; a variant with tenfold higher affinity as well as substantially improved stability were identified (Warszawski et al., 2019). Furthermore, a fully automated design protocol, AbLIFT, was established for improving molecular interactions across the variable light-heavy interface and applied to anti-VEGF/QSOX1 antibodies to improve affinity, stability, and expression (Warszawski et al., 2019). In another application, DMS was combined with Deep learning to optimize the affinity, viscosity, clearance, solubility, and immunogenicity of trastuzumab (Mason et al., 2021). Recently, by leveraging DMS technology, researchers engineered a nanobody initially specific for SARS-CoV-1 RBD in order to bind SARS-CoV-2 RBD (Laroche et al., 2022). Our group contributed to the DMS-based engineering of an ACE2 decoy that could neutralize the SARS-Cov-2 Omicron variant and proved the decoy prevented escape for each single-residue mutation in the RBD of SARS-Cov-2 (Ikemura et al., 2022). We also constructed a database, SpikeDB, that provides changes in infectivity, antigenic escape, ACE2 affinity, and protein expression caused by point mutations in the spike protein of SARS-CoV-2 using DMS (Ikemura et al., 2022).
5.2 Validation of repertoire data mining
Repertoire sequencing is generating large amounts of human BCR data. However, most published BCR sequences lack information about targeted antigen or epitope. The development of antigen-specific B cell sorting technologies such as LIBRA-seq could solve the problem of assigning antibodies to their antigens (see section 2) (Setliff et al., 2019). Although antigen-specific B cell sorting is quite a powerful approach, it requires recombinant antigens and specialized cell sorting techniques that are difficult to scale to large numbers of antigens. Computational approaches to identifying antigen-specific antibodies are one way of simplifying very large repertoire sequence data sets.
Various methods are being developed to cluster antibodies that target the same antigen and epitope. Some approaches use information from antibody sequence only, while others use structure information, if available. (Galson et al., 2015b; Xu et al., 2019; Ripoll et al., 2021; Wong et al., 2021). Here, we will focus on sequence-based approaches to search for antibodies with similar target antigens and epitopes.
Clustering of clonotypes was the first method used to group antibodies that possibly target the same antigen (Reddy et al., 2010; Zhu et al., 2013; Galson et al., 2015b; Greiff et al., 2015b). This approach assumes that antibodies with the same V and J genes as well as a given CDR3 amino acid sequence identity (e.g. 80%–100%) in the heavy chain are more likely than other BCRs to target the same antigen and epitope (Galson et al., 2015b; Truck et al., 2015; Soto et al., 2019). Clustering of clonotypes can be applied to single-chain (usually heavy chain) or heavy and light chain paired data (Raybould et al., 2021b). A recent study assembled approximately 8,000 published COVID-19 antibodies from more than 200 donors and demonstrated that antibodies binding to SARS-CoV-2 spike RBD, NTD or S2 possessed distinct convergent clonotype features (Wang et al., 2022). Such clonotype clusters are generally restricted to BCRs with the same CDR3 length (Satpathy et al., 2015); Since many antibodies whose CDR3 length differs by 1-2 amino acids have been found to target the same epitope (D'Angelo et al., 2018; Wong et al., 2019), there is benefit in adding flexibility to BCR clustering. Moreover, antibodies with different V and J genes or with CDR3 sequence identities below 80% have been found to target the same anti-SARS-CoV-2 RBD epitopes (Barnes et al., 2020; Dejnirattisai et al., 2021) or NTD epitopes (Liu et al., 2021). In the case of the human antibody repertoires, overlap between donors as defined by clonotyping antibody sequences is about 0.3% among three healthy adult donors and 0.1% among three cord blood samples (Soto et al., 2019). In order to increase sensitivity, our group constructed a clustering tool on the InterClone web server that provides a more flexible thresholds CDR similarity (Wilamowski et al., 2022). This method assumes that antibodies within a CDR sequence identity are more likely to target the same epitope. A detailed explanation of this method and the process of a realistic application are described below.
Recently, two groups discovered a set of 11 SARS-CoV-2 infection enhancing antibodies (Li et al., 2021; Liu et al., 2021). We sought to identify such infection-enhancing antibodies in a large-scale antibody repertoire sequence data from COVID-19 patients and healthy donors (Ismanto et al., 2022). Because enhancing antibodies bind their antigen primarily via their heavy chain, as captured in the Cryo-EM structure data (PDB ID 7LAB, 7DZX, 7DZY), we focused on heavy chain CDRs. Moreover, since we did not know in advance the safest sequence identity threshold to use for each CDR, we used 80% for CDRH1 and CDRH2 and 60% for CDRH3. Although we could find antigen binders within these thresholds, the false positive rate was quite high (more than 80%) (Ismanto et al., 2022). A safer threshold seems to be 90% for CDRH1 and CDRH2 and 70% for CDRH3. Donor antigen exposure was a critical factor in the false positive rate. We found that the true enhancing antibody rate was approximately 100 times higher in COVID-19 patients than in healthy donors. Unlike other web servers (e.g., Vidjil, AbYsis, OAS, etc.) (Duez et al., 2016; Swindells et al., 2017; Olsen et al., 2022), InterClone hosts a large database of BCR and TCR sequences and allows such flexible search or clustering operations on the stored data. InterClone also allows users to control data visibility.
6 Future perspectives
Human BCR repertoires are shaped by antigen exposure. A wide range of diseases, from infection, cancer, and autoimmunity, can shape our repertoires. Aging also has a profound effect on BCR (and TCR) diversity. For these reasons, BCR and TCR repertoires have attracted much attention as potential biomarkers for health, disease, vaccination, or other therapeutic activity. In recent months, two groups have reported the ability to clearly separate COVID-19 patients based on BCR repertoires (Ortega et al., 2021; Chen et al., 2022), and our own, unpublished findings support this ability. Therefore, it is reasonable to anticipate a new generation of biomarkers based on BCR repertoires that will drive forward technology in the four main areas discussed here (sorting, sequencing, analysis, and validation).
One of the distinguishing features of BCR-based biomarkers from conventional biomarkers is that the antibodies encoded by the BCR sequences are directly involved in the prevention or (as in the case of autoimmunity) mediation of disease. This implies that the downstream application of BCR-based biomarkers is a new generation of therapeutics that closely resemble our body’s own defence mechanisms. Computational methods, in particular the ability to accurately predict antibody-antigen interactions from repertoire data, will make a critical difference in these efforts. Given the recent breakthroughs in Deep learning, there is thus much cause for optimism in the coming years for repertoire-based biomarkers and therapeutics.
One topic we have not covered is antibody delivery. Antibodies are traditionally administered directly as proteins. Moreover, mRNA vaccination has proved to be a robust and safe way to induce neutralizing antibodies against COVID-19 (Chaudhary et al., 2021). mRNA vaccinations are now being designed against various antigen targets including Zika, influenza virus, CMV, Respiratory syncytial virus, Ebola, and HIV. At present, most mRNA vaccines encode one or more antigens (Barbier et al., 2022). Two studies have explored the feasibility of expressing antibodies against Respiratory syncytial virus and HIV via mRNA vaccination (Tiwari et al., 2018; Lindsay et al., 2020). In one study, researchers expressed whole palivizumab (neutralizing antibody of RSV) in the lung via synthetic mRNA delivery by intratracheal aerosol (Tiwari et al., 2018). Cells co-transfected with mRNAs encoding the light and heavy chains at a 1:4 molar ratio could efficiently form whole IgG antibodies and prevent detectable infection (Tiwari et al., 2018). In another study, mRNA that encoded a HIV neutralizing antibody (PGT121) as well as a membrane anchor protein, was used to efficiently localize the antibody to the cell surface and capture simian-HIV (Lindsay et al., 2020). Thus, it will be interesting to see whether antibody delivery can be routinely implemented as RNA.
Of the four technologies reviewed here (B cell sorting, BCR sequencing, BCR repertoire analysis, and Experimental validation), repertoire analysis is the one area that expected to become radically transformed by advances in computational science. While we can cluster antibodies into specificity groups using CDR or gene usage similarity, the sensitivity of such methods is severely limited. Such limitation, in turn, is due to poor coverage of well-annotated BCR sequences, as discussed in a research (Jespersen et al., 2019). Only a few antigens have been well studied, so machine learning models are currently unable to learn the patterns associated with specific epitopes. Therefore, the transformation from clustering based on similarity to the ability to predict epitopes will require steady progress in sorting, sequencing, and validation. Such progress will come through investment in repertoire-based analysis of various diseases, data sharing and basic infrastructure. The establishment of data standards is one important step. It remains uncertain whether data providers will merge together under government-sponsored institutions (e.g., NIH, EBI, AMED) or remain independently operated. In either scenario, it will be interesting to see how the pharmaceutical industry responds to the growing information contained in human BCR repertoires.
Statements
Author contributions
DS conceived the idea and outlook of this review. ZX, HI, HZ, DS, and DS reviewed the published literature and wrote the manuscript under the supervision of DS. ZX, HZ, DS, and FS made the figures and table. All authors listed participated in the revision and discussions. ZX formulated the draft, FS and DS revised the final submitted version.
Funding
This work was supported by Japan Agency for Medical Research and Development (AMED) Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research) under Grant Numbers 22ama121025j0001.
Acknowledgments
We would like to thank all members of the Systems Immunology and Genome Informatics Lab for the very helpful discussion.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1
AbbottW. M.DamschroderM. M.LoweD. C. (2014). Current approaches to fine mapping of antigen-antibody interactions. Immunology142 (4), 526–535. 10.1111/imm.12284
2
Acquaye-SeedahE.ReczekE. E.RussellH. H.DiVenereA. M.SandmanS. O.CollinsJ. H.et al (2018). Characterization of individual human antibodies that bind pertussis toxin stimulated by acellular immunization. Infect. Immun.86 (6). 10.1128/IAI.00004-18
3
AkbarR.BashourH.RawatP.RobertP. A.SmorodinaE.CotetT. S.et al (2022). Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies. MAbs14 (1), 2008790. 10.1080/19420862.2021.2008790
4
AlamyarE.GiudicelliV.LiS.DurouxP.LefrancM.-P. (2012). Imgt/Highv-Quest: The Imgt® web portal for immunoglobulin (ig) or antibody and T cell receptor (tr) analysis from ngs high throughput and deep sequencing. Immunome Res.08 (01). 10.4172/1745-7580.1000056
5
AlmagroJ. C.TeplyakovA.LuoJ.SweetR. W.KodangattilS.Hernandez-GuzmanF.et al (2014). Second antibody modeling assessment (AMA-II). Proteins.82 (8), 1553–1562. 10.1002/prot.24567
6
AmannaI. J.SlifkaM. K. (2006). Quantitation of rare memory B cell populations by two independent and complementary approaches. J. Immunol. Methods317 (1-2), 175–185. 10.1016/j.jim.2006.09.005
7
AmbrosettiF.OlsenT. H.OlimpieriP. P.Jimenez-GarciaB.MilanettiE.MarcatilliP.et al (2020). proABC-2: PRediction of AntiBody contacts v2 and its application to information-driven docking. Bioinformatics36 (20), 5107–5108. 10.1093/bioinformatics/btaa644
8
AnsariH. R.RaghavaG. P. (2010). Identification of conformational B-cell Epitopes in an antigen from its primary sequence. Immunome Res.6, 6. 10.1186/1745-7580-6-6
9
AttafN.Cervera-MarzalI.DongC.GilL.RenandA.SpinelliL.et al (2020). FB5P-seq: FACS-based 5-prime end single-cell RNA-seq for integrative analysis of transcriptome and antigen receptor repertoire in B and T cells. Front. Immunol.11, 216. 10.3389/fimmu.2020.00216
10
BaekM.DiMaioF.AnishchenkoI.DauparasJ.OvchinnikovS.LeeG. R.et al (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science373 (6557), 871–876. 10.1126/science.abj8754
11
BanachM.HarleyI. T. W.McCarthyM. K.ResterC.StassinopoulosA.KedlR. M.et al (2022). Magnetic enrichment of SARS-CoV-2 antigen-binding B cells for analysis of transcriptome and antibody repertoire. Magnetochemistry8 (2), 23. 10.3390/magnetochemistry8020023
12
BarbierA. J.JiangA. Y.ZhangP.WoosterR.AndersonD. G. (2022). The clinical progress of mRNA vaccines and immunotherapies. Nat. Biotechnol.40 (6), 840–854. 10.1038/s41587-022-01294-2
13
BarnesC. O.JetteC. A.AbernathyM. E.DamK. A.EssweinS. R.GristickH. B.et al (2020). SARS-CoV-2 neutralizing antibody structures inform therapeutic strategies. Nature588 (7839), 682–687. 10.1038/s41586-020-2852-1
14
Bashford-RogersR. J.PalserA. L.IdrisS. F.CarterL.EpsteinM.CallardR. E.et al (2014). Capturing needles in haystacks: A comparison of B-cell receptor sequencing methods. BMC Immunol.15 (29), 29. 10.1186/s12865-014-0029-0
15
BaumA.FultonB. O.WlogaE.CopinR.PascalK. E.RussoV.et al (2021). Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies. Science369 (6506), 1014–1018. 10.1126/science.abd0831
16
BennettM. R.DongJ.BombardiR. G.SotoC.ParringtonH. M.NargiR. S.et al (2019). Human VH1-69 gene-encoded human monoclonal antibodies against Staphylococcus aureus IsdB use at least three distinct modes of binding to inhibit bacterial growth and pathogenesis. mBio10 (5), e02473. 10.1128/mBio.02473-19
17
BermanH. M.WestbrookJ.FengZ.GillilandG.BhatT. N.WeissigH.et al (2008). The protein Data Bank. Nucleic Acids Res.28 (1), 235–242. 10.1093/nar/28.1.235
18
BhattacharyaS.DunnP.ThomasC. G.SmithB.SchaeferH.ChenJ.et al (2018). ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci. Data5, 180015. 10.1038/sdata.2018.15
19
BlancP.Moro-SibilotL.BarthlyL.JagotF.ThisS.de BernardS.et al (2016). Mature IgM-expressing plasma cells sense antigen and develop competence for cytokine production upon antigenic challenge. Nat. Commun.7, 13600. 10.1038/ncomms13600
20
BlechM.PeterD.FischerP.BauerM. M.HafnerM.ZeebM.et al (2013). One target—two different binding modes: Structural insights into gevokizumab and canakinumab interactions to interleukin-1β. J. Mol. Biol.425 (1), 94–111. 10.1016/j.jmb.2012.09.021
21
BolotinD. A.PoslavskyS.MitrophanovI.ShugayM.MamedovI. Z.PutintsevaE. V.et al (2015). MiXCR: Software for comprehensive adaptive immunity profiling. Nat. Methods12 (5), 380–381. 10.1038/nmeth.3364
22
BoonyaratanakornkitJ.TaylorJ. J. (2019). Techniques to study antigen-specific B cell responses. Front. Immunol.10, 1694. 10.3389/fimmu.2019.01694
23
BourquardT.MusnierA.PuardV.TahirS.AyoubM. A.JullianY.et al (2018). MAbTope: A method for improved epitope mapping. J. I.201 (10), 3096–3105. 10.4049/jimmunol.1701722
24
BrineyB.InderbitzinA.JoyceC.BurtonD. R. (2019). Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature566 (7744), 393–397. 10.1038/s41586-019-0879-y
25
BuusT. B.HerreraA.IvanovaE.MimitouE.ChengA.HeratiR. S.et al (2021). Improving oligo-conjugated antibody signal in multimodal single-cell analysis. Elife10, e61973. 10.7554/eLife.61973
26
CampbellP. J.PleasanceE. D.StephensP. J.DicksE.RanceR.GoodheadI.et al (2008). Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc. Natl. Acad. Sci. U. S. A.105 (35), 13081–13086. 10.1073/pnas.0801523105
27
CaoY.SuB.GuoX.SunW.DengY.BaoL.et al (2020). Potent neutralizing antibodies against SARS-CoV-2 identified by high-throughput single-cell sequencing of convalescent patients' B cells. Cell182 (1), 73–84. 10.1016/j.cell.2020.05.025
28
CarlsonC. S.EmersonR. O.SherwoodA. M.DesmaraisC.ChungM. W.ParsonsJ. M.et al (2013). Using synthetic templates to design an unbiased multiplex PCR assay. Nat. Commun.4, 2680. 10.1038/ncomms3680
29
ChaudharyN.WeissmanD.WhiteheadK. A. (2021). mRNA vaccines for infectious diseases: Principles, delivery and clinical translation. Nat. Rev. Drug Discov.20 (11), 817–838. 10.1038/s41573-021-00283-5
30
ChaudharyN.WesemannD. R. (2018). Analyzing immunoglobulin repertoires. Front. Immunol.9, 462. 10.3389/fimmu.2018.00462
31
ChenY.YeZ.ZhangY.XieW.ChenQ.LanC.et al (2022). A deep learning model for accurate diagnosis of infection using antibody repertoires. J. I.208 (12), 2675–2685. 10.4049/jimmunol.2200063
32
ChristleyS.ScarboroughW.SalinasE.RoundsW. H.TobyI. T.FonnerJ. M.et al (2018). VDJServer: A cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements. Front. Immunol.9, 976. 10.3389/fimmu.2018.00976
33
CliffordJ.HøieM. H.NielsenM.DeleuranS.PetersB.MarcatiliP. (2022). BepiPred-3.0: Improved B-cell epitope prediction using protein language models. One Bungtown Road, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. 10.1101/2022.07.11.499418
34
CollatzM.MockF.BarthE.HolzerM.SachseK.MarzM. (2021). EpiDope: A deep neural network for linear B-cell epitope prediction. Bioinformatics37 (4), 448–455. 10.1093/bioinformatics/btaa773
35
CorrieB. D.MarthandanN.ZimonjaB.JaglaleJ.ZhouY.BarrE.et al (2018). iReceptor: A platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories. Immunol. Rev.284 (1), 24–41. 10.1111/imr.12666
36
D'AngeloS.FerraraF.NaranjoL.ErasmusM. F.HraberP.BradburyA. R. M. (2018). Many routes to an antibody heavy-chain CDR3: Necessary, yet insufficient, for specific binding. Front. Immunol.9, 395. 10.3389/fimmu.2018.00395
37
DaiB.Bailey-KelloggC. (2021). Protein interaction interface region prediction by geometric deep learning. Bioinformatics37, 2580–2588. 10.1093/bioinformatics/btab154
38
DavilaA.XuZ.LiS.RozewickiJ.WilamowskiJ.KotelnikovS.et al (2022). AbAdapt: An adaptive approach to predicting antibody–antigen complex structures from sequence. Bioinform. Adv.2 (1). 10.1093/bioadv/vbac015
39
DavydovY. I.TonevitskyA. G. (2009). Prediction of linear B-cell epitopes. Mol. Biol. Los. Angel.43 (1), 150–158. 10.1134/s0026893309010208
40
DejnirattisaiW.ZhouD.GinnH. M.DuyvesteynH. M. E.SupasaP.CaseJ. B.et al (2021). The antigenic anatomy of SARS-CoV-2 receptor binding domain. Cell184 (8), 2183–2200. e2122. 10.1016/j.cell.2021.02.032
41
DeKoskyB. J.IppolitoG. C.DeschnerR. P.LavinderJ. J.WineY.RawlingsB. M.et al (2013). High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat. Biotechnol.31 (2), 166–169. 10.1038/nbt.2492
42
DeKoskyB. J.KojimaT.RodinA.CharabW.IppolitoG. C.EllingtonA. D.et al (2015). In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat. Med.21 (1), 86–91. 10.1038/nm.3743
43
DondelingerM.FileeP.SauvageE.QuintingB.MuyldermansS.GalleniM.et al (2018). Understanding the significance and implications of antibody numbering and antigen-binding surface/residue definition. Front. Immunol.9, 2278. 10.3389/fimmu.2018.02278
44
DoucettV. P.GerhardW.OwlerK.CurryD.BrownL.BaumgarthN. (2005). Enumeration and characterization of virus-specific B cells by multicolor flow cytometry. J. Immunol. Methods303 (1-2), 40–52. 10.1016/j.jim.2005.05.014
45
DuezM.GiraudM.HerbertR.RocherT.SalsonM.ThonierF. (2016). Vidjil: A web platform for analysis of high-throughput repertoire sequencing. PLoS One11 (11), e0166126. 10.1371/journal.pone.0166126
46
DunbarJ.DeaneC. M. (2016). Anarci: Antigen receptor numbering and receptor classification. Bioinformatics32 (2), 298–300. 10.1093/bioinformatics/btv552
47
DunbarJ.KrawczykK.LeemJ.BakerT.FuchsA.GeorgesG.et al (2014). SAbDab: The structural antibody database. Nucleic Acids Res.42, D1140–D1146. 10.1093/nar/gkt1043
48
EhrenmannF.KaasQ.LefrancM. P. (2010). IMGT/3Dstructure-DB and IMGT/DomainGapAlign: A database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF. Nucleic Acids Res.38, D301–D307. 10.1093/nar/gkp946
49
EhrenmannF.LefrancM. P. (2011). IMGT/3Dstructure-DB: Querying the IMGT database for 3D structures in immunology and immunoinformatics (IG or antibodies, TR, MH, RPI, and FPIA). Cold Spring Harb. Protoc.2011 (6), pdb.prot5637–761. 10.1101/pdb.prot5637
50
El-ManzalawyY.DobbsD.HonavarV. (2008). Predicting flexible length linear B-cell epitopes. Comput. Syst. Bioinforma. Conf.7, 121–132.
51
EvansR.O’NeillM.PritzelA.AntropovaN.SeniorA.GreenT.et al (2022). Protein complex prediction with AlphaFold-Multimer. One Bungtown Road, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. 10.1101/2021.10.04.463034
52
FieldsJ. K.KihnK.BirkedalG. S.KlontzE. H.SjostromK.GuntherS.et al (2021). Molecular basis of selective cytokine signaling inhibition by antibodies targeting a shared receptor. Front. Immunol.12, 779100. 10.3389/fimmu.2021.779100
53
FowlerD. M.FieldsS. (2014). Deep mutational scanning: A new style of protein science. Nat. Methods11 (8), 801–807. 10.1038/nmeth.3027
54
GaiottoT.HuftonS. E. (2016). Cross-neutralising nanobodies bind to a conserved pocket in the Hemagglutinin stem region identified using yeast display and deep mutational scanning. PLoS One11 (10), e0164296. 10.1371/journal.pone.0164296
55
GalsonJ. D.SchaetzleS.Bashford-RogersR. J. M.RaybouldM. I. J.KovaltsukA.KilpatrickG. J.et al (2020). Deep sequencing of B cell receptor repertoires from COVID-19 patients reveals strong convergent immune signatures. Front. Immunol.11, 605170. 10.3389/fimmu.2020.605170
56
GalsonJ. D.TruckJ.FowlerA.ClutterbuckE. A.MunzM.CerundoloV.et al (2015a). Analysis of B Cell repertoire dynamics following hepatitis B vaccination in humans, and enrichment of vaccine-specific antibody sequences. EBioMedicine2 (12), 2070–2079. 10.1016/j.ebiom.2015.11.034
57
GalsonJ. D.TruckJ.FowlerA.MunzM.CerundoloV.PollardA. J.et al (2015b). In-depth assessment of within-individual and inter-individual variation in the B cell receptor repertoire. Front. Immunol.6, 531. 10.3389/fimmu.2015.00531
58
GarrettM. E.ItellH. L.CrawfordK. H. D.BasomR.BloomJ. D.OverbaughJ. (2020). Phage-DMS: A comprehensive method for fine mapping of antibody epitopes. iScience23 (10), 101622. 10.1016/j.isci.2020.101622
59
GieselmannL.KreerC.ErcanogluM. S.LehnenN.ZehnerM.SchommersP.et al (2021). Effective high-throughput isolation of fully human antibodies targeting infectious pathogens. Nat. Protoc.16 (7), 3639–3671. 10.1038/s41596-021-00554-w
60
GoldsteinL. D.ChenY. J.WuJ.ChaudhuriS.HsiaoY. C.SchneiderK.et al (2019). Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies. Commun. Biol.2, 304. 10.1038/s42003-019-0551-y
61
GreaneyA. J.StarrT. N.BarnesC. O.WeisblumY.SchmidtF.CaskeyM.et al (2021a). Mapping mutations to the SARS-CoV-2 RBD that escape binding by different classes of antibodies. Nat. Commun.12 (1), 4196. 10.1038/s41467-021-24435-8
62
GreaneyA. J.StarrT. N.GilchukP.ZostS. J.BinshteinE.LoesA. N.et al (2021b). Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition. Cell Host Microbe29 (1), 44–57. 10.1016/j.chom.2020.11.007
63
GreiffV.BhatP.CookS. C.MenzelU.KangW.ReddyS. T. (2015a). A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status. Genome Med.7 (1), 49. 10.1186/s13073-015-0169-8
64
GreiffV.MihoE.MenzelU.ReddyS. T. (2015b). Bioinformatic and statistical analysis of adaptive immune repertoires. Trends Immunol.36 (11), 738–749. 10.1016/j.it.2015.09.006
65
GuoY.ChenK.KwongP. D.ShapiroL.ShengZ. (2019). cAb-rep: A database of curated antibody repertoires for exploring antibody diversity and predicting antibody prevalence. Front. Immunol.10, 2365. 10.3389/fimmu.2019.02365
66
GuptaN. T.Vander HeidenJ. A.UdumanM.Gadala-MariaD.YaariG.KleinsteinS. H. (2015). Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data: Table 1. Bioinformatics31 (20), 3356–3358. 10.1093/bioinformatics/btv359
67
HansenJ.BaumA.PascalK. E.RussoV.GiordanoS.WlogaE.et al (2020). Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail. Science369 (6506), 1010–1014. 10.1126/science.abd0827
68
HaqueA.EngelJ.TeichmannS. A.LonnbergT. (2017). A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med.9 (1), 75. 10.1186/s13073-017-0467-4
69
HasanM. M.KhatunM. S.KurataH. (2020). iLBE for computational identification of linear B-cell epitopes by integrating sequence and evolutionary features. Genomics Proteomics Bioinforma.18 (5), 593–600. 10.1016/j.gpb.2019.04.004
70
HeB.LiuS.WangY.XuM.CaiW.LiuJ.et al (2021). Rapid isolation and immune profiling of SARS-CoV-2 specific memory B cell in convalescent COVID-19 patients via LIBRA-seq. Signal Transduct. Target. Ther.6 (1), 195. 10.1038/s41392-021-00610-7
71
HeatherJ. M.IsmailM.OakesT.ChainB. (2018). High-throughput sequencing of the T-cell receptor repertoire: Pitfalls and opportunities. Brief. Bioinform.19 (4), 554–565. 10.1093/bib/bbw138
72
HolcombJ.SpellmonN.ZhangY.DoughanM.LiC.YangZ. (2017). Protein crystallization: Eluding the bottleneck of X-ray crystallography. AIMS Biophys.4 (4), 557–575. 10.3934/biophy.2017.4.557
73
HotopS. K.ReimeringS.ShekharA.AsgariE.BeutlingU.DahlkeC.et al (2022). Peptide microarrays coupled to machine learning reveal individual epitopes from human antibody responses with neutralizing capabilities against SARS-CoV-2. Emerg. Microbes Infect.11 (1), 1037–1048. 10.1080/22221751.2022.2057874
74
HuangR. Y.KrystekS. R.Jr.FelixN.GrazianoR. F.SrinivasanM.PashineA.et al (2018). Hydrogen/deuterium exchange mass spectrometry and computational modeling reveal a discontinuous epitope of an antibody/TL1A Interaction. MAbs10 (1), 95–103. 10.1080/19420862.2017.1393595
75
IkemuraN.TaminishiS.InabaT.ArimoriT.MotookaD.KatohK.et al (2022). An engineered ACE2 decoy neutralizes the SARS-CoV-2 Omicron variant and confers protection against infection in vivo. Sci. Transl. Med.14 (650), eabn7737. 10.1126/scitranslmed.abn7737
76
ImmunoMind (2019). Immunarch: An R package for painless bioinformatics analysis of T-cell and B-cell immune repertoires. Zenodo10, 5281. 10.5281/zenodo.3367200
77
IsmantoH. S.XuZ.SaputriD. S.WilamowskiJ.LiS.NugrahaD. K.et al (2022). Landscape of infection enhancing antibodies in COVID-19 and healthy donors. One Bungtown Road, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. 10.1101/2022.07.09.499414
78
JacksonK. J.LiuY.RoskinK. M.GlanvilleJ.HohR. A.SeoK.et al (2014). Human responses to influenza vaccination show seroconversion signatures and convergent antibody rearrangements. Cell Host Microbe16 (1), 105–114. 10.1016/j.chom.2014.05.013
79
JespersenM. C.MahajanS.PetersB.NielsenM.MarcatiliP. (2019). Antibody specific B-cell epitope predictions: Leveraging information from antibody-antigen protein complexes. Front. Immunol.10, 298. 10.3389/fimmu.2019.00298
80
JespersenM. C.PetersB.NielsenM.MarcatiliP. (2017). BepiPred-2.0: Improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res.45 (W1), W24–W29. 10.1093/nar/gkx346
81
JinW.YangQ.PengY.YanC.LiY.LuoZ.et al (2021). Single-cell RNA-Seq reveals transcriptional heterogeneity and immune subtypes associated with disease activity in human myasthenia gravis. Cell Discov.7 (1), 85. 10.1038/s41421-021-00314-w
82
JuB.ZhangQ.GeJ.WangR.SunJ.GeX.et al (2020). Human neutralizing antibodies elicited by SARS-CoV-2 infection. Nature584 (7819), 115–119. 10.1038/s41586-020-2380-z
83
JumperJ.EvansR.PritzelA.GreenT.FigurnovM.RonnebergerO.et al (2021). Highly accurate protein structure prediction with AlphaFold. Nature596 (7873), 583–589. 10.1038/s41586-021-03819-2
84
KimS. I.NohJ.KimS.ChoiY.YooD. K.LeeY.et al (2021). Stereotypic neutralizing V H antibodies against SARS-CoV-2 spike protein receptor binding domain in patients with COVID-19 and healthy individuals. Sci. Transl. Med.13 (578), eabd6990. 10.1126/scitranslmed.abd6990
85
KotagiriP.MesciaF.RaeW. M.BergamaschiL.TuongZ. K.TurnerL.et al (2022). B cell receptor repertoire kinetics after SARS-CoV-2 infection and vaccination. Cell Rep.38 (7), 110393. 10.1016/j.celrep.2022.110393
86
KovaltsukA.KrawczykK.GalsonJ. D.KellyD. F.DeaneC. M.TruckJ. (2017). How B-cell receptor repertoire sequencing can Be enriched with structural antibody data. Front. Immunol.8, 1753. 10.3389/fimmu.2017.01753
87
KovaltsukA.LeemJ.KelmS.SnowdenJ.DeaneC. M.KrawczykK. (2018). Observed antibody Space: A resource for data mining next-generation sequencing of antibody repertoires. J. I.201 (8), 2502–2509. 10.4049/jimmunol.1800708
88
KramerK. J.JohnsonN. V.ShiakolasA. R.SuryadevaraN.PeriasamyS.RajuN.et al (2021). Potent neutralization of SARS-CoV-2 variants of concern by an antibody with an uncommon genetic signature and structural mode of spike recognition. Cell Rep.37 (1), 109784. 10.1016/j.celrep.2021.109784
89
KramerK. J.WilfongE. M.VossK.BaroneS. M.ShiakolasA. R.RajuN.et al (2022). Single-cell profiling of the antigen-specific response to BNT162b2 SARS-CoV-2 RNA vaccine. Nat. Commun.13 (1), 3466. 10.1038/s41467-022-31142-5
90
KrawczykK.LiuX.BakerT.ShiJ.DeaneC. M. (2014). Improving B-cell epitope prediction and its application to global antibody-antigen docking. Bioinformatics30 (16), 2288–2294. 10.1093/bioinformatics/btu190
91
KringelumJ. V.LundegaardC.LundO.NielsenM. (2012). Reliable B cell epitope predictions: Impacts of method development and improved benchmarking. PLoS Comput. Biol.8 (12), e1002829. 10.1371/journal.pcbi.1002829
92
LarocheA.Orsini DelgadoM. L.ChalopinB.CuniasseP.DuboisS.SierockiR.et al (2022). Deep mutational engineering of broadly-neutralizing nanobodies accommodating SARS-CoV-1 and 2 antigenic drift. MAbs14 (1), 2076775. 10.1080/19420862.2022.2076775
93
LavinderJ. J.WineY.GieseckeC.IppolitoG. C.HortonA. P.LunguO. I.et al (2014). Identification and characterization of the constituent human serum antibodies elicited by vaccination. Proc. Natl. Acad. Sci. U. S. A.111 (6), 2259–2264. 10.1073/pnas.1317793111
94
LeeJ. H.ToyL.KosJ. T.SafonovaY.SchiefW. R.Havenar-DaughtonC.et al (2021). Vaccine genetics of IGHV1-2 VRC01-class broadly neutralizing antibody precursor naive human B cells. NPJ Vaccines6 (1), 113. 10.1038/s41541-021-00376-7
95
LefrancM. P.GiudicelliV.KaasQ.DupratE.Jabado-MichaloudJ.ScavinerD.et al (2005). IMGT, the international ImMunoGeneTics information system(R). Nucleic Acids Res.33, D593–D597. 10.1093/nar/gki065
96
LeinsterT.CobboldC. A. (2012). Measuring diversity: The importance of species similarity. Ecology93 (3), 477–489. 10.1890/10-2402.1
97
LiD.EdwardsR. J.ManneK.MartinezD. R.SchaferA.AlamS. M.et al (2021). In vitro and in vivo functions of SARS-CoV-2 infection-enhancing and neutralizing antibodies. Cell184 (16), 4203–4219. e4232. 10.1016/j.cell.2021.06.021
98
LiS.LefrancM. P.MilesJ. J.AlamyarE.GiudicelliV.DurouxP.et al (2013). IMGT/HighV QUEST paradigm for T cell receptor IMGT clonotype diversity and next generation repertoire immunoprofiling. Nat. Commun.4, 2333. 10.1038/ncomms3333
99
LianY.GeM.PanX. M. (2014). Epmlr: Sequence-based linear B-cell epitope prediction method using multiple linear regression. BMC Bioinforma.15, 414. 10.1186/s12859-014-0414-y
100
LiberisE.VelickovicP.SormanniP.VendruscoloM.LioP. (2018). Parapred: Antibody paratope prediction using convolutional and recurrent neural networks. Bioinformatics34 (17), 2944–2950. 10.1093/bioinformatics/bty305
101
LindsayK. E.VanoverD.ThoresenM.KingH.XiaoP.BadialP.et al (2020). Aerosol delivery of synthetic mRNA to vaginal mucosa leads to durable expression of broadly neutralizing antibodies against HIV. Mol. Ther.28 (3), 805–819. 10.1016/j.ymthe.2020.01.002
102
LiuY.SohW. T.KishikawaJ. I.HiroseM.NakayamaE. E.LiS.et al (2021). An infectivity-enhancing site on the SARS-CoV-2 spike protein targeted by antibodies. Cell184 (13), 3452–3466. 10.1016/j.cell.2021.05.032
103
Lopez-Santibanez-JacomeL.Avendano-VazquezS. E.Flores-JassoC. F. (2019). The pipeline repertoire for ig-seq analysis. Front. Immunol.10, 899. 10.3389/fimmu.2019.00899
104
LuS.LiY.MaQ.NanX.ZhangS. (2022). A structure-based B-cell epitope prediction model through combing local and global features. Front. Immunol.13, 890943. 10.3389/fimmu.2022.890943
105
ManavalanB.GovindarajR. G.ShinT. H.KimM. O.LeeG. (2018). iBCE-EL: A new ensemble learning framework for improved linear B-cell epitope prediction. Front. Immunol.9, 1695. 10.3389/fimmu.2018.01695
106
MansoT.FolchG.GiudicelliV.Jabado-MichaloudJ.KushwahaA.Nguefack NgouneV.et al (2022). IMGT® databases, related tools and web resources through three main axes of research and development. Nucleic Acids Res.50 (D1), D1262–D1272. 10.1093/nar/gkab1136
107
MasonD. M.FriedensohnS.WeberC. R.JordiC.WagnerB.MengS. M.et al (2021). Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat. Biomed. Eng.5 (6), 600–612. 10.1038/s41551-021-00699-9
108
MassonG. R.BurkeJ. E.AhnN. G.AnandG. S.BorchersC.BrierS.et al (2019). Recommendations for performing, interpreting and reporting hydrogen deuterium exchange mass spectrometry (HDX-MS) experiments. Nat. Methods16 (7), 595–602. 10.1038/s41592-019-0459-y
109
McDanielJ. R.DeKoskyB. J.TannoH.EllingtonA. D.GeorgiouG. (2016). Ultra-high-throughput sequencing of the immune receptor repertoire from millions of lymphocytes. Nat. Protoc.11 (3), 429–442. 10.1038/nprot.2016.024
110
MonasterioE.Mei-DanO.HackneyA. C.CloningerR. (2018). Comparison of the personality traits of male and female BASE jumpers. Front. Psychol.9, 1665. 10.3389/fpsyg.2018.01665
111
NelsonA. L.DhimoleaE.ReichertJ. M. (2010). Development trends for human monoclonal antibody therapeutics. Nat. Rev. Drug Discov.9 (10), 767–774. 10.1038/nrd3229
112
NielsenS. C. A.YangF.JacksonK. J. L.HohR. A.RoltgenK.JeanG. H.et al (2020). Human B cell clonal expansion and convergent antibody responses to SARS-CoV-2. Cell Host Microbe28 (4), 516–525. e515. 10.1016/j.chom.2020.09.002
113
OlsenT. H.BoylesF.DeaneC. M. (2022). Observed antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci.31 (1), 141–146. 10.1002/pro.4205
114
OrtegaM. R.SpisakN.MoraT.WalczakA. M. (2021). Modeling and predicting the overlap of B- and T-cell receptor repertoires in healthy and SARS-CoV-2 infected individuals. One Bungtown Road, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. 10.1101/2021.12.17.473105
115
PanJ.ZhangS.ChouA.BorchersC. H. (2016). Higher-order structural interrogation of antibodies using middle-down hydrogen/deuterium exchange mass spectrometry. Chem. Sci.7 (2), 1480–1486. 10.1039/c5sc03420e
116
PedrioliA.OxeniusA. (2021). Single B cell technologies for monoclonal antibody discovery. Trends Immunol.42 (12), 1143–1158. 10.1016/j.it.2021.10.008
117
PengL.OganesyanV.DamschroderM. M.WuH.Dall'AcquaW. F. (2011). Structural and functional characterization of an agonistic anti-human EphA2 monoclonal antibody. J. Mol. Biol.413 (2), 390–405. 10.1016/j.jmb.2011.08.018
118
PicotJ.GuerinC. L.Le Van KimC.BoulangerC. M. (2012). Flow cytometry: Retrospective, fundamentals and recent instrumentation. Cytotechnology64 (2), 109–130. 10.1007/s10616-011-9415-0
119
PintoD.MontaniE.BolliM.GaravagliaG.SallustoF.LanzavecchiaA.et al (2013). A functional BCR in human IgA and IgM plasma cells. Blood121 (20), 4110–4114. 10.1182/blood-2012-09-459289
120
PintoD.ParkY. J.BeltramelloM.WallsA. C.TortoriciM. A.BianchiS.et al (2020). Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody. Nature583 (7815), 290–295. 10.1038/s41586-020-2349-y
121
PittalaS.Bailey-KelloggC. (2020). Learning context-aware structural representations to predict antigen and antibody binding interfaces. Bioinformatics36 (13), 3996–4003. 10.1093/bioinformatics/btaa263
122
PolonskyM.ChainB.FriedmanN. (2016). Clonal expansion under the microscope: Studying lymphocyte activation and differentiation using live-cell imaging. Immunol. Cell Biol.94 (3), 242–249. 10.1038/icb.2015.104
123
PonomarenkoJ.BuiH. H.LiW.FussederN.BourneP. E.SetteA.et al (2008). ElliPro: A new structure-based tool for the prediction of antibody epitopes. BMC Bioinforma.9, 514. 10.1186/1471-2105-9-514
124
PonsJ.StrattonJ. R.KirschJ. F. (2002). How do two unrelated antibodies, HyHEL-10 and F9.13.7, recognize the same epitope of hen egg-white lysozyme?Protein Sci.11 (10), 2308–2315. 10.1110/ps.0209102
125
PuchadesC.KukrerB.DiefenbachO.Sneekes-VrieseE.JuraszekJ.KoudstaalW.et al (2019). Epitope mapping of diverse influenza Hemagglutinin drug candidates using HDX-MS. Sci. Rep.9 (1), 4735. 10.1038/s41598-019-41179-0
126
QiH.MaM.HuC.XuZ. W.WuF. L.WangN.et al (2021). Antibody binding epitope mapping (AbMap) of hundred antibodies in a single run. Mol. Cell. Proteomics20, 100059. 10.1074/mcp.RA120.002314
127
QiH.WangF.TaoS. C. (2019). Proteome microarray technology and application: Higher, wider, and deeper. Expert Rev. Proteomics16 (10), 815–827. 10.1080/14789450.2019.1662303
128
RaybouldM. I. J.KovaltsukA.MarksC.DeaneC. M. (2021a). CoV-AbDab: The coronavirus antibody database. Bioinformatics37 (5), 734–735. 10.1093/bioinformatics/btaa739
129
RaybouldM. I. J.MarksC.LewisA. P.ShiJ.BujotzekA.TaddeseB.et al (2020). Thera-SAbDab: The therapeutic structural antibody database. Nucleic Acids Res.48 (D1), D383–D388. 10.1093/nar/gkz827
130
RaybouldM. I. J.ReesA. R.DeaneC. M. (2021b). Current strategies for detecting functional convergence across B-cell receptor repertoires. MAbs13 (1), 1996732. 10.1080/19420862.2021.1996732
131
ReddyS. T.GeX.MiklosA. E.HughesR. A.KangS. H.HoiK. H.et al (2010). Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells. Nat. Biotechnol.28 (9), 965–969. 10.1038/nbt.1673
132
RipollD. R.ChaudhuryS.WallqvistA. (2021). Using the antibody-antigen binding interface to train image-based deep neural networks for antibody-epitope classification. PLoS Comput. Biol.17 (3), e1008864. 10.1371/journal.pcbi.1008864
133
RivesA.MeierJ.SercuT.GoyalS.LinZ.LiuJ.et al (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. U. S. A.118 (15), e2016239118. 10.1073/pnas.2016239118
134
RobbianiD. F.GaeblerC.MueckschF.LorenziJ. C. C.WangZ.ChoA.et al (2020). Convergent antibody responses to SARS-CoV-2 in convalescent individuals. Nature584 (7821), 437–442. 10.1038/s41586-020-2456-9
135
RobertP. A.AkbarR.FrankR.PavlovićM.WidrichM.SnapkovI.et al (2022). Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for real-world antibody specificity prediction. One Bungtown Road, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. 10.1101/2021.07.06.451258
136
RosatiE.DowdsC. M.LiaskouE.HenriksenE. K. K.KarlsenT. H.FrankeA. (2017). Overview of methodologies for T-cell receptor repertoire analysis. BMC Biotechnol.17 (1), 61. 10.1186/s12896-017-0379-9
137
RosenfeldA. M.MengW.Luning PrakE. T.HershbergU. (2017). ImmuneDB: A system for the analysis and exploration of high-throughput adaptive immune receptor sequencing data. Bioinformatics33 (2), 292–293. 10.1093/bioinformatics/btw593
138
RoskinK. M.JacksonK. J. L.LeeJ. Y.HohR. A.JoshiS. A.HwangK. K.et al (2020). Aberrant B cell repertoire selection associated with HIV neutralizing antibody breadth. Nat. Immunol.21 (2), 199–209. 10.1038/s41590-019-0581-0
139
RuffoloJ. A.SulamJ.GrayJ. J. (2022). Antibody structure prediction using interpretable deep learning. Patterns (N Y)3 (2), 100406. 10.1016/j.patter.2021.100406
140
SahaS.RaghavaG. P. (2006). Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins.65 (1), 40–48. 10.1002/prot.21078
141
SaravananV.GauthamN. (2015). Harnessing computational biology for exact linear B-cell epitope prediction: A novel amino acid composition-based feature descriptor. OMICS A J. Integr. Biol.19 (10), 648–658. 10.1089/omi.2015.0095
142
SatpathyS.WagnerS. A.BeliP.GuptaR.KristiansenT. A.MalinovaD.et al (2015). Systems-wide analysis of BCR signalosomes and downstream phosphorylation and ubiquitylation. Mol. Syst. Biol.11 (6), 810. 10.15252/msb.20145880
143
Sela-CulangI.OfranY.PetersB. (2015). Antibody specific epitope prediction-emergence of a new paradigm. Curr. Opin. Virol.11, 98–102. 10.1016/j.coviro.2015.03.012
144
SetliffI.McDonnellW. J.RajuN.BombardiR. G.MurjiA. A.ScheepersC.et al (2018). Multi-donor longitudinal antibody repertoire sequencing reveals the existence of public antibody clonotypes in HIV-1 infection. Cell Host Microbe23 (6), 845–854.e6. 10.1016/j.chom.2018.05.001
145
SetliffI.ShiakolasA. R.PilewskiK. A.MurjiA. A.MapengoR. E.JanowskaK.et al (2019). High-throughput mapping of B cell receptor sequences to antigen specificity. Cell179 (7), 1636–1646. e1615. 10.1016/j.cell.2019.11.003
146
SeydouxE.HomadL. J.MacCamyA. J.ParksK. R.HurlburtN. K.JenneweinM. F.et al (2020). Analysis of a SARS-CoV-2-infected individual reveals development of potent neutralizing antibodies with limited somatic mutation. Immunity53 (1), 98–105. e5. 10.1016/j.immuni.2020.06.001
147
SherG.ZhiD.ZhangS. (2017). Drrep: Deep ridge regressed epitope predictor. BMC Genomics18 (6), 676. 10.1186/s12864-017-4024-8
148
ShiakolasA. R.KramerK. J.JohnsonN. V.WallS. C.SuryadevaraN.WrappD.et al (2022). Efficient discovery of SARS-CoV-2-neutralizing antibodies via B cell receptor sequencing and ligand blocking. Nat. Biotechnol.40 (8), 1270–1275. 10.1038/s41587-022-01232-2
149
ShiakolasA. R.KramerK. J.WrappD.RichardsonS. I.SchaferA.WallS.et al (2021). Cross-reactive coronavirus antibodies with diverse epitope specificities and Fc effector functions. Cell Rep. Med.2 (6), 100313. 10.1016/j.xcrm.2021.100313
150
SinghH.AnsariH. R.RaghavaG. P. (2013). Improved method for linear B-cell epitope prediction using antigen's primary sequence. PLoS One8 (5), e62216. 10.1371/journal.pone.0062216
151
SinghM.Al-EryaniG.CarswellS.FergusonJ. M.BlackburnJ.BartonK.et al (2019). High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nat. Commun.10 (1), 3120. 10.1038/s41467-019-11049-4
152
SmakajE.BabrakL.OhlinM.ShugayM.BrineyB.TosoniD.et al (2020). Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences. Bioinformatics36 (6), 1731–1739. 10.1093/bioinformatics/btz845
153
SolihahB.AzhariA.MusdholifahA. (2020). Enhancement of conformational B-cell epitope prediction using CluSMOTE. PeerJ Comput. Sci.6, e275. 10.7717/peerj-cs.275
154
SotoC.BombardiR. G.BranchizioA.KoseN.MattaP.SevyA. M.et al (2019). High frequency of shared clonotypes in human B cell receptor repertoires. Nature566 (7744), 398–402. 10.1038/s41586-019-0934-8
155
SourisseauM.LawrenceD. J. P.SchwarzM. C.StorrsC. H.VeitE. C.BloomJ. D.et al (2019). Deep mutational scanning comprehensively maps how Zika envelope protein mutations affect viral growth and antibody escape. J. Virol.93 (23), e01291. 10.1128/JVI.01291-19
156
StandleyD. M.NakanishiT.XuZ.HarunaS.LiS.NazlicaS. A.et al (2022). The evolution of structural genomics. Biophys. Rev.submitted.
157
StarrT. N.GreaneyA. J.AddetiaA.HannonW. W.ChoudharyM. C.DingensA. S.et al (2021). Prospective mapping of viral mutations that escape antibodies used to treat COVID-19. Science371 (6531), 850–854. 10.1126/science.abf9302
158
StarrT. N.GreaneyA. J.HannonW. W.LoesA. N.HauserK.DillenJ. R.et al (2022). Shifting mutational constraints in the SARS-CoV-2 receptor-binding domain during viral evolution. Science377 (6604), 420–424. 10.1126/science.abo7896
159
StarrT. N.GreaneyA. J.HiltonS. K.EllisD.CrawfordK. H. D.DingensA. S.et al (2020). Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell182 (5), 1295–1310. e1220. 10.1016/j.cell.2020.08.012
160
SulenA.IslamS.WolffA. S. B.OftedalB. E. (2020). The prospects of single-cell analysis in autoimmunity. Scand. J. Immunol.92 (5), e12964. 10.1111/sji.12964
161
SuryadevaraN.ShiakolasA. R.VanBlarganL. A.BinshteinE.ChenR. E.CaseJ. B.et al (2022). An antibody targeting the N-terminal domain of SARS-CoV-2 disrupts the spike trimer. J. Clin. Invest.132 (11), e159062. 10.1172/JCI159062
162
SutermasterB. A.DarlingE. M. (2019). Considerations for high-yield, high-throughput cell enrichment: Fluorescence versus magnetic sorting. Sci. Rep.9 (1), 227. 10.1038/s41598-018-36698-1
163
SweredoskiM. J.BaldiP. (2009). COBEpro: A novel system for predicting continuous B-cell epitopes. Protein Eng. Des. Sel.22 (3), 113–120. 10.1093/protein/gzn075
164
SweredoskiM. J.BaldiP. (2008). Pepito: Improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics24 (12), 1459–1460. 10.1093/bioinformatics/btn199
165
SwindellsM. B.PorterC. T.CouchM.HurstJ.AbhinandanK. R.NielsenJ. H.et al (2017). abYsis: Integrated antibody sequence and structure-management, analysis, and prediction. J. Mol. Biol.429 (3), 356–364. 10.1016/j.jmb.2016.08.019
166
TiwariP. M.VanoverD.LindsayK. E.BawageS. S.KirschmanJ. L.BhosleS.et al (2018). Engineered mRNA-expressed antibodies prevent respiratory syncytial virus infection. Nat. Commun.9 (1), 3999. 10.1038/s41467-018-06508-3
167
TranM. H.SchoederC. T.ScheyK. L.MeilerJ. (2022). Computational structure prediction for antibody-antigen complexes from hydrogen-deuterium exchange mass spectrometry: Challenges and outlook. Front. Immunol.13, 859964. 10.3389/fimmu.2022.859964
168
TruckJ.RamasamyM. N.GalsonJ. D.RanceR.ParkhillJ.LunterG.et al (2015). Identification of antigen-specific B cell receptor sequences using public repertoire analysis. J. I.194 (1), 252–261. 10.4049/jimmunol.1401405
169
TsaiK. C.LeeY. C.TsengT. S. (2021). Comprehensive deep mutational scanning reveals the immune-escaping hotspots of SARS-CoV-2 receptor-binding domain targeting neutralizing antibodies. Front. Microbiol.12, 698365. 10.3389/fmicb.2021.698365
170
TurchaninovaM. A.DavydovA.BritanovaO. V.ShugayM.BikosV.EgorovE. S.et al (2016). High-quality full-length immunoglobulin profiling with unique molecular barcoding. Nat. Protoc.11 (9), 1599–1616. 10.1038/nprot.2016.093
171
UpadhyayA. A.KauffmanR. C.WolabaughA. N.ChoA.PatelN. B.ReissS. M.et al (2018). Baldr: A computational pipeline for paired heavy and light chain immunoglobulin reconstruction in single-cell RNA-seq data. Genome Med.10 (1), 20. 10.1186/s13073-018-0528-3
172
Vander HeidenJ. A.YaariG.UdumanM.SternJ. N.O'ConnorK. C.HaflerD. A.et al (2014). pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30 (13), 1930–1932. 10.1093/bioinformatics/btu138
173
VossW. N.HouY. J.JohnsonN. V.DelidakisG.KimJ. E.JavanmardiK.et al (2021). Prevalent, protective, and convergent IgG recognition of SARS-CoV-2 non-RBD spike epitopes. Science372 (6546), 1108–1112. 10.1126/science.abg5268
174
WalkerL. M.ShiakolasA. R.VenkatR.LiuZ. A.WallS.RajuN.et al (2022). High-throughput B cell epitope determination by next-generation sequencing. Front. Immunol.13, 855772. 10.3389/fimmu.2022.855772
175
WaltariE.McGeeverA.FriedlandN.KimP. S.McCutcheonK. M. (2019). Functional enrichment and analysis of antigen-specific memory B cell antibody repertoires in PBMCs. Front. Immunol.10, 1452. 10.3389/fimmu.2019.01452
176
WangB.KluweC. A.LunguO. I.DeKoskyB. J.KerrS. A.JohnsonE. L.et al (2015). Facile discovery of a diverse panel of anti-ebola virus antibodies by immune repertoire mining. Sci. Rep.5, 13926. 10.1038/srep13926
177
WangC.LiW.DrabekD.OkbaN. M. A.van HaperenR.OsterhausA.et al (2020). A human monoclonal antibody blocking SARS-CoV-2 infection. Nat. Commun.11 (1), 2251. 10.1038/s41467-020-16256-y
178
WangY.YuanM.LvH.PengJ.WilsonI. A.WuN. C. (2022). A large-scale systematic survey reveals recurring molecular features of public antibody responses to SARS-CoV-2. Immunity55 (6), 1105–1117. e1104. 10.1016/j.immuni.2022.03.019
179
WarszawskiS.Borenstein KatzA.LipshR.KhmelnitskyL.Ben NissanG.JavittG.et al (2019). Optimizing antibody affinity and stability by the automated design of the variable light-heavy chain interfaces. PLoS Comput. Biol.15 (8), e1007207. 10.1371/journal.pcbi.1007207
180
WilamowskiJ.XuZ.IsmantoH. S.LiS.TeraguchiS.Llamas- CovarrubiasM. A.et al (2022). InterClone: Store, search and cluster Adaptive immune receptor repertoires. One Bungtown Road, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. 10.1101/2022.07.31.501809
181
WongW. K.LeemJ.DeaneC. M. (2019). Comparative analysis of the CDR loops of antigen receptors. Front. Immunol.10, 2454. 10.3389/fimmu.2019.02454
182
WongW. K.RobinsonS. A.BujotzekA.GeorgesG.LewisA. P.ShiJ.et al (2021). Ab-ligity: Identifying sequence-dissimilar antibodies that bind to the same epitope. MAbs13 (1), 1873478. 10.1080/19420862.2021.1873478
183
WoodruffM. C.RamonellR. P.NguyenD. C.CashmanK. S.SainiA. S.HaddadN. S.et al (2020). Extrafollicular B cell responses correlate with neutralizing antibodies and morbidity in COVID-19. Nat. Immunol.21 (12), 1506–1516. 10.1038/s41590-020-00814-z
184
WuL.XueZ.JinS.ZhangJ.GuoY.BaiY.et al (2022). huARdb: human Antigen Receptor database for interactive clonotype-transcriptome analysis at the single-cell level. Nucleic Acids Res.50 (D1), D1244–D1254. 10.1093/nar/gkab857
185
WuP. J.KabakovaI. V.RubertiJ. W.SherwoodJ. M.DunlopI. E.PatersonC.et al (2018). Water content, not stiffness, dominates Brillouin spectroscopy measurements in hydrated materials. Nat. Methods15 (8), 561–562. 10.1038/s41592-018-0076-1
186
WüthrichK. (1990). Protein structure determination in solution by NMR spectroscopy. J. Biol. Chem.265 (36), 22059–22062. 10.1016/s0021-9258(18)45665-7
187
XuS.BottcherL.ChouT. (2020). Diversity in biology: Definitions, quantification and models. Phys. Biol.17 (3), 031001. 10.1088/1478-3975/ab6754
188
XuZ.DavilaA.WilamowskiJ.TeraguchiS.StandleyD. M. (2022). Improved antibody-specific epitope prediction using AlphaFold and AbAdapt. Chembiochem.23, e202200303. 10.1002/cbic.202200303
189
XuZ.LiS.RozewickiJ.YamashitaK.TeraguchiS.InoueT.et al (2019). Functional clustering of B cell receptors using sequence and structural features. Mol. Syst. Des. Eng.4 (4), 769–778. 10.1039/c9me00021f
190
YaariG.KleinsteinS. H. (2015). Practical guidelines for B-cell receptor repertoire sequencing analysis. Genome Med.7, 121. 10.1186/s13073-015-0243-2
191
YaoB.ZhangL.LiangS.ZhangC. (2012). SVMTriP: A method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity. PLoS One7 (9), e45152. 10.1371/journal.pone.0045152
192
YeJ.MaN.MaddenT. L.OstellJ. M. (2013). IgBLAST: An immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41, W34–W40. 10.1093/nar/gkt382
193
YekuO.FrohmanM. A. (2011). Rapid amplification of cDNA ends (RACE). Methods Mol. Biol.703, 107–122. 10.1007/978-1-59745-248-9_8
194
YoonH.MackeJ.WestA. P.Jr.FoleyB.BjorkmanP. J.KorberB.et al (2015). Catnap: A tool to compile, analyze and tally neutralizing antibody panels. Nucleic Acids Res.43 (W1), W213–W219. 10.1093/nar/gkv404
195
ZhangQ.YangJ.BautistaJ.BaditheA.OlsonW.LiuY. (2018). Epitope mapping by HDX-MS elucidates the surface coverage of antigens associated with high blocking efficiency of antibodies to birch pollen allergen. Anal. Chem.90 (19), 11315–11323. 10.1021/acs.analchem.8b01864
196
ZhangW.WangL.LiuK.WeiX.YangK.DuW.et al (2020). Pird: Pan immune repertoire database. Bioinformatics36 (3), 897–903. 10.1093/bioinformatics/btz614
197
ZhengC.ZhengL.YooJ. K.GuoH.ZhangY.GuoX.et al (2017). Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell169 (7), 1342–1356. e1316. 10.1016/j.cell.2017.05.035
198
ZhouC.ChenZ.ZhangL.YanD.MaoT.TangK.et al (2019). SEPPA 3.0-enhanced spatial epitope prediction enabling glycoprotein antigens. Nucleic Acids Res.47 (W1), W388–W394. 10.1093/nar/gkz413
199
ZhuJ.WuX.ZhangB.McKeeK.O'DellS.SotoC.et al (2013). De novo identification of VRC01 class HIV-1-neutralizing antibodies by next-generation sequencing of B-cell transcripts. Proc. Natl. Acad. Sci. U. S. A.110 (43), E4088–E4097. 10.1073/pnas.1306262110
200
ZostS. J.GilchukP.ChenR. E.CaseJ. B.ReidyJ. X.TrivetteA.et al (2020). Rapid isolation and profiling of a diverse panel of human monoclonal antibodies targeting the SARS-CoV-2 spike protein. Nat. Med.26 (9), 1422–1427. 10.1038/s41591-020-0998-x
Summary
Keywords
antibody, antigen, B cell sorting, B cell receptor (BCR), next generation sequencing (NGS), repertoire analysis, epitope, machine learning
Citation
Xu Z, Ismanto HS, Zhou H, Saputri DS, Sugihara F and Standley DM (2022) Advances in antibody discovery from human BCR repertoires. Front. Bioinform. 2:1044975. doi: 10.3389/fbinf.2022.1044975
Received
15 September 2022
Accepted
11 October 2022
Published
20 October 2022
Volume
2 - 2022
Edited by
Kenji Mizuguchi, Health and Nutrition, Japan
Reviewed by
Hiroki Shirai, RIKEN Center for Computational Science, Japan
Sofia Kossida, Université de Montpellier, France
Updates
Copyright
© 2022 Xu, Ismanto, Zhou, Saputri, Sugihara and Standley.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Daron M. Standley, standley@biken.osaka-u.ac.jp
This article was submitted to Drug Discovery in Bioinformatics, a section of the journal Frontiers in Bioinformatics
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.