Integrating machine learning to advance epitope mapping

Grewal, Simranjit; Hegde, Nidhi; Yanow, Stephanie K.

doi:10.3389/fimmu.2024.1463931

REVIEW article

Front. Immunol., 30 September 2024

Sec. Vaccines and Molecular Therapeutics

Volume 15 - 2024 | https://doi.org/10.3389/fimmu.2024.1463931

Integrating machine learning to advance epitope mapping

Simranjit Grewal¹

Nidhi Hegde²

Stephanie K. Yanow^1,3*

¹Department of Medical Microbiology and Immunology, University of Alberta, Edmonton, AB, Canada
²Department of Computing Science, University of Alberta, Edmonton, AB, Canada
³School of Public Health, University of Alberta, Edmonton, AB, Canada

Identifying epitopes, or the segments of a protein that bind to antibodies, is critical for the development of a variety of immunotherapeutics and diagnostics. In vaccine design, the intent is to identify the minimal epitope of an antigen that can elicit an immune response and avoid off-target effects. For prognostics and diagnostics, the epitope-antibody interaction is exploited to measure antigens associated with disease outcomes. Experimental methods such as X-ray crystallography, cryo-electron microscopy, and peptide arrays are used widely to map epitopes but vary in accuracy, throughput, cost, and feasibility. By comparing machine learning epitope mapping tools, we discuss the importance of data selection, feature design, and algorithm choice in determining the specificity and prediction accuracy of an algorithm. This review discusses limitations of current methods and the potential for machine learning to deepen interpretation and increase feasibility of these methods. We also propose how machine learning can be employed to refine epitope prediction to address the apparent promiscuity of polyreactive antibodies and the challenge of defining conformational epitopes. We highlight the impact of machine learning on our current understanding of epitopes and its potential to guide the design of therapeutic interventions with more predictable outcomes.

1 Introduction

Vaccines are among the most successful and cost-effective public health interventions, particularly to protect against infectious diseases. This was never more evident than during the COVID-19 pandemic where vaccines were the most valuable intervention to protect vulnerable populations from hospitalization and death (1, 2). All the SARS-CoV-2 vaccines were based on the spike protein as the vaccine antigen, either expressed from DNA, mRNA, or as a recombinant protein, and elicited immune responses against dominant epitopes (or segments) within the protein. In response, new variants of the virus emerged with different amino acid sequences in these epitopes, impairing the efficacy of the first-generation vaccines and requiring the design of new variant-specific ones. The ongoing management of SARS-CoV-2 depends on our preparedness against emerging variants; this can be facilitated by designing vaccines that focus immune responses on highly conserved epitopes.

Precise mapping of B-cell epitopes is critical beyond vaccine development for other antibody-based interventions such as immunotherapeutics and diagnostics (3, 4). Likewise, mapping T-cell epitopes can improve our understanding of immune responses to infectious and autoimmune diseases (5) and support the development of immunotherapeutics with an increased safety profile by reducing off-target effects (6). The challenge is how to identify or map these epitopes.

Machine learning algorithms targeted at epitope mapping are undergoing continual development and immense growth. These algorithms are improving upon existing in vitro methods by exploiting the vast reservoir of published experimental data to find patterns and predict regions of a protein likely to be a part of an epitope. Here, we discuss how current methods have benefited from integrating machine learning and explore future applications to further refine epitope mapping.

2 In vitro epitope mapping methods

What constitutes an epitope varies immensely between two major types of immune cells in the body: T-cells and B-cells. T-cell epitopes consist of antigens processed into 8-10 amino acid linear segments that are recognized by major histocompatibility complex (MHC) class I molecules and 13-17 amino acid segments for MHC class II molecules (7). In contrast, B-cell epitopes are typically (90%) conformational, involving amino acids that are spatially close to one another due to secondary structure, tertiary structure, or quaternary structure (8–10). This makes them variable in length and structure. Further, while antibodies typically interact with 15 to 22 amino acids on the surface of an antigen, approximately 5 amino acids contribute most significantly to stabilizing the antibody-antigen complex (11).

Several experimental methods are used to map epitopes, each with pros and cons related to accuracy, throughput, and cost (12). X-ray crystallography, for example, provides accurate information about the three-dimensional complex between an antibody and the antigen that can encompass conformational changes and even highlight dynamics by comparing bound and unbound states. Epitopes are mapped based on their proximity to crystallized residues within the paratope of the antibody (13). Analysis tools, such as PDBePISA, provide information on the interactions of residues between an epitope and the corresponding paratope based on distance, participating residues, and orientation (14). Challenges with this approach are the stability, size, and solubility of the proteins to form a well-ordered lattice. Post-translational modifications can also impede crystal formation. Further, there can be several possible antibody-antigen complexes depending on the physical parameters used for crystallization (15).

Cryo-electron microscopy (cryo-EM) is another similar biophysical method for mapping epitopes. Freezing epitopes bound and unbound to their paratope yields density maps that compare the two states. These density maps are made by compositing 2D images from various angles into a 3D map of the molecule and the differences between the two density maps indicate the residues of an epitope (16).

On the other hand, screening peptide arrays provides much higher throughput, but at the cost of accuracy; the resolution of an epitope by a peptide array is not as high as the resolution attained from cryo-EM or X-ray crystallography. Peptide arrays consist of libraries of synthetic, overlapping peptides (usually 15-20 amino acids in length) that are screened with an antibody of interest (or serum) to identify those peptides that bind strongly (17). Thousands of peptides can be screened at once. The primary antibody bound to peptides is detected by a secondary antibody that emits a luminescent signal. Despite epitopes being typically conformational, peptide arrays are useful for several reasons (9). They are ideal for mapping linear epitopes and the extent of overlap among the peptides can reveal residues that are important components of the epitope. These can be specifically mutated (e.g. by alanine walking) to confirm their contribution to antibody binding (18). Thus, it is possible to determine key residues that dominate the antibody’s affinity for the linear epitope (19). However, peptide arrays are less informative to identify conformational epitopes. When screening reveals multiple peptides recognized by an antibody, the interpretation of these data relies on 3D models of the protein to map the binding sites. If the reactive peptides lie in proximity within the 3D protein structure, this could indicate a conformational epitope (20).

3 Machine learning

Machine learning is a powerful tool that can be used to address the complexity of data from epitope mapping studies. It is effectively a subset of artificial intelligence in which previous outcomes of a task provide an algorithm with experience that allows it to improve in the same task applied to new data (21). ‘Surviving the Titanic’ is a common example of machine learning; this problem involves a dataset containing passenger information that is used to predict the likelihood of survival of specific passengers (22). Information on gender, age, cabin class, etc. are studied and used to predict whether a passenger survived the Titanic disaster. The algorithm identifies patterns in the data and correlates them to what it is tasked with predicting (in this case survival). The ‘Surviving the Titanic’ problem is a valuable practice problem to learn the basics of machine learning and includes many resources and tutorials.

3.1 Datasets – training and testing

Importantly, the performance of an algorithm directly relates to the breadth and complexity of the datasets it learns from. The more novel cases and rare ‘edge’ cases in the dataset, the more the algorithm will learn how to predict similar scenarios in the future. In this way, machine learning can be used to make accurate predictions on a variety of tasks. In general, the algorithm benefits from more data but it is also important to recognize that curated data may be valuable for developing specialized algorithms. For example, if a machine learning algorithm is used specifically to predict cancer epitopes, it should be trained on a specialized cancer database, such as CEDAR (Table 1) (23). Adding data for the prediction of different types of epitopes may improve a machine learning algorithm’s generalizability but could also mask biological patterns unique to cancer epitopes. Additionally, data that are very similar to existing training data do not ‘teach’ an algorithm anything new.

Table 1

Table 1. T-cell epitope associated databases^a.

Many publicly accessible databases are available as sources of datasets for either T-cell or B-cell epitopes (Tables 1, 2). These databases provide large, specialized sources of data that algorithms can study and use for predictions. RCSB PDB, for example, provides primarily 3D structures of various proteins and multimolecular complexes, and information relating to their molecular composition, position, length, chains, etc. AntigenDB, on the other hand, provides sequence, structure, classifications, etc., of various experimentally validated antigens. AntiJen contains data on many topics but, notably, it is a valuable source of data for both B-cell and T-cell epitopes.

Table 2

Table 2. B-cell epitope associated databases^a.

In essence, a machine learning algorithm studies databases to find patterns that inform the analysis of novel data. The quality of the algorithm requires careful selection of the appropriate database. For example, EPSVR is a B-cell epitope prediction algorithm focused on conformational epitopes and the type of data needed for accurate conformational epitope prediction is structural (24). Their training set consisted of structures of curated antigen-antibody complexes and was tested on a sampling of structures from the Conformation Epitope Database (CED) (25, 26). By limiting their datasets to a particular type of epitope, the algorithm becomes much more accurate at predicting that type of epitope. In the case of EPSVR, linear epitopes were excluded because of their poor correlation to structural data. If linear epitopes were included, they may affect how the algorithm evaluates structural data and decrease prediction accuracy on conformational epitopes. For LBTOPE, a model focused on linear epitope prediction, only the primary sequences of epitopes for B-cell receptors and non B-cell epitopes are needed (27). Since these data are more available and less complex, it is easier to generate a dataset of tens of thousands of entries. In another example, the BCIpep database contains many linear epitopes and was used to develop ABCpred to support peptide-based vaccine design and allergy research (28, 29). Training the dataset specifically on linear epitopes resulted in a model that can strongly predict these types of epitopes.

Finally, datasets are further divided into ‘training sets’, ‘validation sets’, and ‘test sets’. Machine learning models are built from training sets; a validation sets help determine the optimal parameters or models for a given problem; and test sets are used to evaluate the machine learning models. These sets are mutually exclusive.

3.2 Features and labels – how data are encoded

Even when an algorithm is provided a database robust enough to characterize the different patterns present in both linear and conformational epitopes, it still requires the tools to make use of those data. These databases are assembled into datasets with each datapoint referred to by the term ‘example’. Within each example there is information relating to ‘labels’ and ‘features’. Labels and features are the tools an algorithm uses to analyze examples.

Labels are what the algorithm learns to predict (the outcome). For instance, the label may refer to the strength of peptide-antibody binding. Machine learning can predict labels based on classification or regression analysis (30); for B-cell epitope prediction, classification involves classifying a peptide as a B-cell epitope or not, while regression analysis would assign a continuous value to the likelihood of the binding.

Features contain descriptive information about the peptide (sequence, structure, physico-chemical properties, etc.) or the parent protein it is derived from. The learning algorithm effectively learns patterns in the features that relate to a specific label value. More specifically, several algorithms use surface accessibility as the feature to predict whether an epitope is likely to be recognized by B-cells as this feature correlates well with binding strength (the label) (31, 32). Developing accurate machine learning models begins with selecting features that correlate strongly to a label (30). The features that are important depend entirely on the nature of the question being asked and the success of an algorithm depends on the design of features (Figure 1). A good feature set uses all data that correlate well with the label (what is being predicted). However, there are many ways to encode the same feature that are more effective for specific machine learning algorithms (Table 3).

Figure 1

Figure 1. Visualization of protein analysis for feature design. The protein structure of leporine serum albumin was visualized in ‘new cartoon’ with VMD (33). Coil and loop regions of the secondary structure are colored yellow and helical regions are colored purple. Certain features are derived from the analysis of the whole protein, smaller subsections, and sequence-based analysis. Features can broadly be categorized as structural, physicochemical, or sequence-based. The features provided below each of the categories are select examples that belong to each category. The secondary structure used as referenced PDB ID 4F5V (34).

Table 3

Table 3. Features employed by machine learning-based B-cell epitope prediction tools.

For example, algorithms EPITOPIA and CBTOPE both use Grantham polarity and Ponnuswami polarity index to calculate polarity as a feature, while LBtope cites CBTOPE’s feature set but excludes polarity entirely (35–37). This may be because LBtope is trying to predict B-cell epitopes of variable lengths and applies a feature set called Composition-Transition-Distribution (CTD) which allows for the comparison of peptides of variable length by simplifying the residues into categories (38). The CTD divisions used by LBtope is based on a set proposed by Chinnasamy et al. that places each amino acid into groups (either 1, 2, or 3) based on certain physicochemical properties: hydrophobicity, polarizability, polarity, and Van der Waal’s volume (39). CTD characterizes the percent frequency of these groups (Composition), their spatial relationship to one another (Transition), and distribution of each group across the peptide (Distribution). The CTD feature set exemplifies an alternate way to encode certain physical features to allow for expanded function of an algorithm by enabling predictions on variable length peptides.

However, the difference in encoding methodologies between features can be much simpler than CTD; hydrophilicity can be calculated using a hydropathy table or a hydrophobicity index for proximal amino acids (40, 41). Hydropathy plots are frequently used to estimate hydrophobic and hydrophilic properties over a 20 amino acid window and, as such, may be an appropriate feature when an algorithm is focusing on predictions of peptides of a similar length (42, 43). Further, there is some variability between experimentally determined hydropathy tables, and pH will affect the hydrophilicity calculations of some amino acids.

Features that describe protein structure may be better at predicting conformational epitopes but may be unimportant when the algorithm is predicting a linear epitope. For example, surface accessibility and flexibility are more impactful features for B-cell epitope prediction software specializing in conformational epitopes. Sequence-based features, on the other hand, may be more beneficial in predicting linear epitopes. Physicochemical features such as hydrophobicity and polarity would be beneficial to both types of predictions and may not benefit one type of prediction more than another. In general, adding more features that provide new information about the data will improve accuracy but with certain limitations. If every feature represents a decision that can be made by a machine learning algorithm, adding noisy features increases the likelihood that a machine learning model uses a feature that appears effective in the training set but is inaccurate once extrapolated to the test set or new cases. This is because every machine learning model assigns a relative weight to different features. When a poorly correlative feature is used to make a prediction, it will likely increase the error. Typically, this is not a major concern unless the dataset is too small; too many features relative to the size of the dataset may impact the predictive accuracy of a model. Further, information provided by databases may limit the features that will be used to predict the label.

3.3 Algorithms – how data are processed

Another equally important aspect of a machine learning algorithm is selecting the algorithm itself. While algorithms can either be classifiers or regressors, there is much more variety to algorithms than just this single trait. We consider four common machine learning algorithms: support vector machine (SVM), neural networks, decision trees, and language models.

SVM or support vector machine is a classifier model that is accurate, simple, common, and elegant; it delineates a boundary in N-dimensional space in which data on one side of the boundary fall into one classification while data on the other side fall into a different classification. For example, if flexibility and net charge are the features for this model, every datapoint is plotted onto a grid using the flexibility score on one axis and net charge on the other. After training on the dataset, SVM algorithms will produce a line that bisects the data. If the algorithm is intended to predict B-cell epitopes, the datapoints would be classified as either “B-cell epitopes” or “non B-cell epitopes” based on this line. SVM is relatively simple computationally but is sensitive to noisy data and too many features relative to the dataset size (44, 45).

Neural networks are another common machine learning algorithm. They process data in a way that mimics how the human brain functions. Neural networks consist of a series of layers: input layer, hidden layer(s), and output layer. The input layer consists of neurons equal to the number of features and the output layer is the prediction made by the algorithm. The hidden layer(s) is defined by the programmer and is characterized by a series of weight values. The sum of all inputs multiplied by the weight values will determine whether the ‘neuron’ in the hidden layer will activate. Activated neurons will propagate data to the output layer in the same process; the hidden layer inputs are multiplied by weight values between the hidden layer and output layer. The activations in the output layer will determine the prediction of the algorithm. During training, a neural network adjusts weight values between layers to strengthen some connections and weaken others similar to how neural pathways in our brain can be reinforced or weakened. Parameters like the optimal size of the hidden layer and number of hidden layers are generally found by trail-and-error. The reason why it is difficult to predict the correct parameters is there is no way to interpret the analysis performed in these hidden layers, a black box. Two common neural networks in biological research are deep neural networks (neural networks with more than four layers) and convolutional neural networks (a neural network architecture that is effective at image processing). Convolutional neural networks can be ‘deep’. Also, neural networks rely on very large datasets and as such, trial-and-error optimization of hidden layers is very demanding computationally. While neural networks struggle with interpretability, they are highly accurate once optimized (46).

Unlike neural networks, decision trees are defined by their interpretability (47). Decision trees involve a series of nodes that each split into two ‘child’ nodes continually until they reach their pre-defined depth limit forming a pyramid shape. At each node, the algorithm imposes a criterion that splits the data into one of the two subsequent nodes. For example, if a peptide is being evaluated by the algorithm, a node could be “net charge ≥3” where all the datapoints that satisfy this criterion go to one of the child nodes and all the other datapoints go to the other child node. During training, the features at each node and the criterion are adjusted to sort the data accurately. Commonly, decision trees are used in random forests that consist of many decision trees where the output is a consensus of a majority of trees (classification), or the mean of tree outputs (regression).

Lastly, language models function by studying patterns in speech and language to predict what comes next probabilistically. These language models are also applied to analyze patterns in protein and genetic sequences to find patterns common to specific types of epitopes and produce prediction tools. Language models accurately represent the data on which they are trained but are very computationally demanding and require large datasets (48).

4 Applications of machine learning to epitope mapping

4.1 Machine learning integrated into prediction of T-cell and B-cell epitopes

Since the early 2000s, the integration of machine learning into epitope prediction has increased the accuracy of those predictions and, currently, most modern algorithms use machine learning methods (Tables 4, 5). These algorithms are provided with sequence and/or structural information about a protein, and they use machine learning to predict which epitopes will be recognized by receptors on B-cells or T-cells (49).

Table 4

Table 4. B-cell epitope prediction tools^a.

Table 5

Table 5. T-cell epitope prediction tools^a.

Non-machine learning methods use scoring methods to predict epitopes. In the case of EpiJen, a 4-step process produces a score and eliminates datapoints in a stepwise manner using quantitative matrices. It first determines whether proteasome cleavage would occur, then whether TAP binding would occur, then whether MHC binding would occur, and finally whether T-cell recognition would occur. The output would be a small subsection of the data.

The early B-cell epitope prediction tools focused primarily on the prediction of linear epitopes despite linear epitopes making up only a small portion of B-cell epitopes (10). Rapidly, new tools were developed that boasted higher accuracy on a benchmark dataset or pioneered new methods of analysis; for example, ABCpred was the first B-cell prediction server based on recurrent neural networks (29). Commonly, tools specialized for linear prediction take sequence files as their input and conformational predictions require structural inputs, but more recently, tools focus on predicting both linear and conformational epitopes. In 2017, BepiPred 2.0 became a benchmark algorithm for the comparison of newer ones based on its accuracy (for 2017 standards) and capability to predict both conformational and linear epitopes (50). Other tools were released since that predict both types of epitopes, including BepiPred 3.0, SEMA-1D, and SEMA-3D (51, 52). Ideally, the best tool is one that is specialized to address a given problem and trained on a dataset containing relevant cases.

The value in predicting T-cell epitopes lies in the characterization of potential immune responses prior to treatment to avoid side effects against self-peptides (53). In T-cell epitope prediction, we encounter the dichotomy of algorithms focused primarily on either MHC class I or MHC class II binding prediction (Table 5). The open binding groove of MHC class II molecules makes predictions much more complex compared to MHC class I. Specifically, this means that the ligand does not fit cleanly inside the binding groove; there is a structural element introduced and MHC class II predictive algorithms have lower accuracy compared to MHC class I algorithms. There exist several accessible online machine learning tools for predicting T-cell epitopes and/or MHC class I and class II peptide binding (Table 5). The review by Peters et al. (54) explores T-cell epitope prediction and algorithm design more thoroughly.

Since features, datasets, and algorithm selection all define the specificity of a machine learning model and the type of data it is predicting, it is difficult to meaningfully compare different T-cell and B-cell epitope prediction tools. These tools will often compare themselves to similar contemporary tools based on performance to benchmark datasets, but there are issues with generalizing the accuracy of these tools. The first issue is that the test sets used to compare algorithms to one another are not the same and often tailored to the specific comparison. For example, comparing linear B-cell epitope prediction software to conformational software would be affected by whether the benchmark dataset contained primarily linear or conformational data. If the benchmark set is small, neural networks may perform worse and if it is noisy then SVM algorithms will be disadvantaged. While this makes it difficult to easily select a single ‘best’ tool, it is important to appreciate that certain tools are highly specialized to address particular problems, and the breadth of tools available should be considered in selecting which are most appropriate.

4.2 Machine learning integrated into experimental epitope mapping methods

A significant advance in epitope mapping is the integration of machine learning to improve the analysis of peptide arrays (Table 6). For example, Xue et al. demonstrated the utility of machine learning algorithms to address a common issue for peptide arrays: certain peptides result in high noise relative to signal in the output data (low signal-to-noise ratio). This noise ratio results in difficulties for interpreting array data and affects its utility in defining the boundaries of an epitope. Often, array data is simplified to streamline interpretation; a threshold-based approach is used to convert data into a binary compatible with classification-type algorithms (20, 55–57). The data are usually sorted into either binding or not binding. However, machine learning can predict which peptides will result in a low signal to noise ratio. By training an algorithm on a small subset of the intended peptides, Xue et al. demonstrated that their machine learning program succeeded in accurately predicting which peptides in the larger peptide set would result in a low signal-to-noise ratio (58). There are several possible ways to consider predictions of which peptides will result in significant noise: ‘noisy’ peptides could be eliminated from training sets for future algorithm development, excluded from testing all together, or the predictions could be used in interpreting results.

Table 6

Table 6. Machine learning applied to in vitro epitope mapping methods.

Machine learning is also used in docking software to predict the discrete interactions between a monoclonal antibody and a protein. Non-machine learning algorithms like Zdock and Rosetta are relatively effective and accurate antibody docking software programs, but they specialize in “local docking” which necessitates partial epitope knowledge, like relative location of the epitope within approximately 8 Å (59, 60). Machine learning allows programs like Mabtope to analyze millions of docking poses and identify those that are optimal. When compared to other methods like FRODOCK (another non-machine learning method), Mabtope reports are more accurate at predicting whether a specific residue participates in an epitope (80% accuracy compared to 35% accuracy) (61, 62). The impacts of machine learning on protein-ligand docking are discussed in a review by Yang et al. (63). Docking-based approaches provide the ability to explore specific antibody-protein interactions in silico.

In X-ray crystallography, machine learning can address the difficulty with performing this method on large proteins. When analyzing large proteins, a pre-experimental step can use B-cell mapping algorithms to focus on specific regions of a protein; testing a smaller discrete region of a protein would help address the size issue and improve the throughput. Machine learning can also predict protein crystallizability which saves time and resources (64, 65). Moreover, the study of highly variable regions of a protein is facilitated by algorithms that can predict ‘correctness’ scores (a per-residue estimate of how reliable the crystal structure is) for residue main chains and side chains to improve interpretability of crystallographic protein structures (Table 6).

Cryo-EM depends on machine learning integration where a cryogenically frozen molecule is subjected to electron microscopy to produce and assemble a 3D reconstruction. Machine learning has automated particle selection, improved post-processing resolution, and map reconstruction (Table 6).

5 Case studies

Several studies of SARS-CoV-2 highlight applications of machine learning to probe the specificity of immune responses to the virus. As an example, Hotop et al. applied machine learning to analyze data from peptide microarrays screened with sera from patients with disease outcomes of varying severity (55). The analysis revealed important differences in the antibody responses to SARS-CoV-2 that correlated with improved clearance of the infection. Unlike conventional B-cell epitope mapping algorithms that use regression analysis (providing a continuous variable as an output), this application applied a random forest in classification analysis providing an output as a ‘1’ or a ‘0’. More specifically, the machine learning algorithms labeled each peptide as either correlated (‘1’) or non-correlated (‘0’) with better disease outcomes. Five different algorithms were used to select 10 peptides (out of 648) that were recognized predominantly by sera from patients with milder infections. This information can now be applied to vaccine design to focus on neutralizing epitopes. It could also be used as a prognostic to predict the severity of disease outcomes.

A second example is the application of the algorithm ScanNet (Table 3) (32) to study the immune response against the receptor binding site of the SARS-CoV-2 Omicron variant (66). ScanNet is a geometric model that characterizes the binding surface of a protein (or protein segment in this case) based on its physical and chemical properties: shape, charge, depth, surface accessibility, etc. The algorithm analyzed antigenicity (the likelihood of antibody recognition) residue by residue of the SARS-CoV-2 receptor binding domain (RBD) from the original strain compared to the same region on each of the Delta, Alpha, Beta and Omicron variants. This analysis suggested that mutations in the RBD reduced antibody recognition, which could explain why immunity to previous variants of SARS-CoV-2 may not be as protective against the Omicron strain. These results were corroborated by competition ELISAs that showed reduced serological recognition of the Omicron RBD and supported the need for an Omicron-specific vaccine. ScanNet’s antigenicity predictions for the variants further supported the need to design Omicron-specific vaccines.

6 Future applications of machine learning to epitope mapping and immunology

Although B-cell epitope prediction algorithms are a common tool, there is room for tremendous growth in the application of machine learning to further refine epitope predictions. This is especially relevant in the interpretation of antibody-binding data that point to multiple epitopes; rather than recognizing a single discrete epitope, some antibodies are promiscuous or polyreactive (67, 68). Applying feature analysis approaches to these types of antibody binding data could identify features that are shared across epitopes within the same or related protein. An exciting opportunity is to use machine learning to extract -with high resolution - the specific features of these epitopes that are required for an antibody to bind. This information can then inform diagnostics, immunotherapeutics or vaccine design. In the example of SARS-CoV-2, peptide vaccines could be designed based on the features of an epitope that are conserved across virus variants to elicit broadly neutralizing antibodies.

Another application of machine learning is to improve prediction of conformational epitopes. High throughput mapping of linear peptide arrays could be integrated with in silico protein modeling, like AlphaFold, to increase the interpretability of the data (69). Recently, the release of AlphaFold 3 further improved the accuracy of predicting biomolecular complexes and overhauled its training procedure (70). Unfortunately, AlphaFold 3 is closed-source, which may limit the potential for specialized derivative models to be developed. For example, AlphaFold-Multimer was derived from the open-source code of Alphafold 2 and improved on its ability to predict large protein complexes (71). As discussed previously, peptide arrays focus on linear epitopes and are considered less informative with regards to conformational epitopes. However, mapping binding regions from an array to a predicted structure may highlight binding clusters that indicate an epitope. A more granular approach to the same methods could be to use the large peptide array datasets and train a machine learning algorithm from which feature data can be extracted. The feature data would encapsulate physical and chemical characteristics associated with binding regions (i.e. charge, secondary structure, etc.) which could also be applied to the 3D structures to identify binding sites. The impact this would have on vaccine development and therapeutics is profound. Success of these algorithms may improve peptide-based vaccines through scaffolded or stapled peptides that can recapitulate 3D structural components. It would also provide more accurate characterization of off-target binding.

A paratope-based approach may also have potential to predict epitopes and play an important role in profiling the safety and specificity of monoclonal antibodies now widely used as therapeutics. Sequences of antigen binding sites and/or structural information could be used to predict the features of a compatible epitope. In this case, the features would be structural, physicochemical, and sequence-based descriptions of the binding region of an antibody and the label(s) would correspond to aspects of an epitope, such as structure, charge, polarity, etc. Just like B-cell epitope prediction software, there are promising predictive models for paratope regions on an antibody (72–74). Additionally, programs like AlphaFold, or faster and higher-throughput derivatives, can be applied to design and inform structural features (69, 75, 76). The burgeoning field of paratope-focused analysis could yield a better understanding of off-target binding, and potentially increase accuracy and safety. Focusing on both the paratope and epitope may result in more predictable outcomes for vaccine design and other therapeutics.

Importantly, machine learning has been integrated into research beyond epitope mapping, with impacts in the broader field of immunology. For example, the tool AllerCatPro 2.0 uses machine learning to predict allergenicity of proteins by predicting potential immune epitopes (77) and, clinically, machine learning has improved allergic disease diagnostics (78). To improve resource management and streamline clinical processes, machine learning-enhanced prognostic tools are being rapidly developed to analyze patient data. Specifically, plasma from SARS-CoV-2 infected patients was used to produce a cytokine panel that can accurately predict disease severity (79). Conditions that are difficult to diagnose, such as auto immune diseases, have benefited significantly from the incorporation of machine learning models to analyze patient data (80). In future, machine learning directed to these clinical applications has the potential to transform personalized medicine.

Author contributions

SG: Writing – original draft, Writing – review & editing. NH: Conceptualization, Writing – review & editing. SY: Conceptualization, Writing – review & editing, Funding acquisition, Project administration, Supervision.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported by a Project Grant (RN399320) from the Canadian Institutes of Health Research, and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number R01AI150944.

Acknowledgments

The authors thank D. Vinals, U. Iyamu, and M. Good for comments and suggestions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Author disclaimer

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References

1. Vasireddy D, Vanaparthy R, Mohan G, Malayala SV, Atluri P. Review of COVID-19 variants and COVID-19 vaccine efficacy: what the clinician should know? J Clin Med Res. (2021) 13:317–25. doi: 10.14740/jocmr4518

PubMed Abstract | Crossref Full Text | Google Scholar

2. Xing K, Tu XY, Liu M, Liang ZW, Chen JN, Li JJ, et al. Efficacy and safety of COVID-19 vaccines: a systematic review. Zhongguo Dang Dai Er Ke Za Zhi. (2021) 23:221–8. doi: 10.7499/j.issn.1008-8830.2101133

PubMed Abstract | Crossref Full Text | Google Scholar

3. Pollard AJ, Bijker EM. A guide to vaccinology: from basic principles to new developments. Nat Rev Immunol. (2021) 21:83–100. doi: 10.1038/s41577-020-00479-7

PubMed Abstract | Crossref Full Text | Google Scholar

4. Ghotloo S, Maghsood F, Golsaz-Shirazi F, Amiri MM, Moog C, Shokri F. Epitope mapping of neutralising anti-SARS-CoV-2 monoclonal antibodies: Implications for immunotherapy and vaccine design. Rev Med Virol. (2022) 32:e2347. doi: 10.1002/rmv.v32.5

PubMed Abstract | Crossref Full Text | Google Scholar

5. Hudson D, Fernandes RA, Basham M, Ogg G, Koohy H. Can we predict T cell specificity with digital biology and machine learning? Nat Rev Immunol. (2023) 23:511–21. doi: 10.1038/s41577-023-00835-3

PubMed Abstract | Crossref Full Text | Google Scholar

6. Becker JP, Riemer AB. The importance of being presented: target validation by immunopeptidomics for epitope-specific immunotherapies. Front Immunol. (2022) 13:883989. doi: 10.3389/fimmu.2022.883989

PubMed Abstract | Crossref Full Text | Google Scholar

7. Sidney J, Peters B, Sette A. Epitope prediction and identification- adaptive T cell responses in humans. Semin Immunol. (2020) 50:101418. doi: 10.1016/j.smim.2020.101418

PubMed Abstract | Crossref Full Text | Google Scholar

8. Barlow DJ, Edwards MS, Thornton JM. Continuous and discontinuous protein antigenic determinants. Nature. (1986) 322:747–8. doi: 10.1038/322747a0

PubMed Abstract | Crossref Full Text | Google Scholar

9. Van Regenmortel MH. What is a B-cell epitope? Methods Mol Biol. (2009) 524:3–20. doi: 10.1007/978-1-59745-450-6_1

PubMed Abstract | Crossref Full Text | Google Scholar

10. Kringelum JV, Nielsen M, Padkjær SB, Lund O. Structural analysis of B-cell epitopes in antibody:protein complexes. Mol Immunol. (2013) 53:24–34. doi: 10.1016/j.molimm.2012.06.001

PubMed Abstract | Crossref Full Text | Google Scholar

11. Rahman Kh S, Chowdhury EU, Sachse K, Kaltenboeck B. Inadequate reference datasets biased toward short non-epitopes confound B-cell epitope prediction. J Biol Chem. (2016) 291:14585–99. doi: 10.1074/jbc.M116.729020

PubMed Abstract | Crossref Full Text | Google Scholar

12. Potocnakova L, Bhide M, Pulzova LB. An introduction to B-cell epitope mapping and in silico epitope prediction. J Immunol Res. (2016) 2016:6760830. doi: 10.1155/2016/6760830

PubMed Abstract | Crossref Full Text | Google Scholar

13. Toride King M, Brooks CL. Epitope mapping of antibody-antigen interactions with X-ray crystallography. Methods Mol Biol. (2018) 1785:13–27. doi: 10.1007/978-1-4939-7841-0_2

PubMed Abstract | Crossref Full Text | Google Scholar

14. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. (2007) 372:774–97. doi: 10.1016/j.jmb.2007.05.022

PubMed Abstract | Crossref Full Text | Google Scholar

15. Smyth MS, Martin JH. x ray crystallography. Mol Pathol. (2000) 53:8–14. doi: 10.1136/mp.53.1.8

PubMed Abstract | Crossref Full Text | Google Scholar

16. Wigge C, Stefanovic A, Radjainia M. The rapidly evolving role of cryo-EM in drug design. Drug Discovery Today Technol. (2020) 38:91–102. doi: 10.1016/j.ddtec.2020.12.003

Crossref Full Text | Google Scholar

17. Amartely H, Iosub-Amir A, Friedler A. Identifying protein-protein interaction sites using peptide arrays. J Vis Exp. (2014) 93:e52097. doi: 10.3791/52097

Crossref Full Text | Google Scholar

18. Morrison KL, Weiss GA. Combinatorial alanine-scanning. Curr Opin Chem Biol. (2001) 5:302–7. doi: 10.1016/S1367-5931(00)00206-4

PubMed Abstract | Crossref Full Text | Google Scholar

19. Szymczak LC, Kuo HY, Mrksich M. Peptide arrays: development and application. Anal Chem. (2018) 90:266–82. doi: 10.1021/acs.analchem.7b04380

PubMed Abstract | Crossref Full Text | Google Scholar

20. Peng M, Dou X, Zhang X, Yan M, Xiong D, Jiang R, et al. Protective antigenic epitopes revealed by immunosignatures after three doses of inactivated SARS-CoV-2 vaccine. Front Immunol. (2022) 13:938378. doi: 10.3389/fimmu.2022.938378

PubMed Abstract | Crossref Full Text | Google Scholar

21. Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, Massachusetts: The MIT Press (2016).

Google Scholar

22. Cukierski W. Titanic - machine learning from disaster. Delaware, USA: Kaggle (2012). Available at: https://www.kaggle.com/competitions/titanic.

Google Scholar

23. Kosaloglu-Yalcin Z, Blazeska N, Vita R, Carter H, Nielsen M, Schoenberger S, et al. The cancer epitope database and analysis resource (CEDAR). Nucleic Acids Res. (2023) 51:D845–D52. doi: 10.1093/nar/gkac902

PubMed Abstract | Crossref Full Text | Google Scholar

24. Liang S, Zheng D, Yao B, Zhang C. EPCES and EPSVR: prediction of B-cell antigenic epitopes on protein surfaces with conformational information. Methods Mol Biol. (2020) 2131:289–97. doi: 10.1007/978-1-0716-0389-5_16

PubMed Abstract | Crossref Full Text | Google Scholar

25. Ponomarenko JV, Bourne PE. Antibody-protein interactions: benchmark datasets and prediction tools evaluation. BMC Struct Biol. (2007) 7:64. doi: 10.1186/1472-6807-7-64

PubMed Abstract | Crossref Full Text | Google Scholar

26. Mintseris J, Wiehe K, Pierce B, Anderson R, Chen R, Janin J, et al. Protein-protein docking benchmark 2.0: an update. Proteins. (2005) 60:214–6. doi: 10.1002/prot.20560

PubMed Abstract | Crossref Full Text | Google Scholar

27. Singh H, Ansari HR, Raghava GP. Improved method for linear B-cell epitope prediction using antigen's primary sequence. PLoS One. (2013) 8:e62216. doi: 10.1371/journal.pone.0062216

PubMed Abstract | Crossref Full Text | Google Scholar

28. Saha S, Bhasin M, Raghava GP. Bcipep: a database of B-cell epitopes. BMC Genomics. (2005) 6:79. doi: 10.1186/1471-2164-6-79

PubMed Abstract | Crossref Full Text | Google Scholar

29. Saha S, Raghava GP. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins. (2006) 65:40–8. doi: 10.1002/prot.21078

PubMed Abstract | Crossref Full Text | Google Scholar

30. Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. (2021) 2:160. doi: 10.1007/s42979-021-00592-x

PubMed Abstract | Crossref Full Text | Google Scholar

31. Kulkarni-Kale U, Bhosle S, Kolaskar AS. CEP: a conformational epitope prediction server. Nucleic Acids Res. (2005) 33:W168–71. doi: 10.1093/nar/gki460

PubMed Abstract | Crossref Full Text | Google Scholar

32. Tubiana J, Schneidman-Duhovny D, Wolfson HJ. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat Methods. (2022) 19:730–9. doi: 10.1038/s41592-022-01490-7

PubMed Abstract | Crossref Full Text | Google Scholar

33. Humphrey W, Dalke A, Schulten K. VMD: Visual molecular dynamics. J Mol Graph Model. (1996) 14:33–8. doi: 10.1016/0263-7855(96)00018-5

Crossref Full Text | Google Scholar

34. Bujacz A. Structures of bovine, equine and leporine serum albumin. Acta Crystallogr D Biol Crystallogr. (2012) 68:1278–89. doi: 10.1107/S0907444912027047

PubMed Abstract | Crossref Full Text | Google Scholar

35. Rubinstein ND, Mayrose I, Pupko T. A machine-learning approach for predicting B-cell epitopes. Mol Immunol. (2009) 46:840–7. doi: 10.1016/j.molimm.2008.09.009

PubMed Abstract | Crossref Full Text | Google Scholar

36. Ansari HR, Raghava GP. Identification of conformational B-cell Epitopes in an antigen from its primary sequence. Immunome Res. (2010) 6:6. doi: 10.1186/1745-7580-6-6

PubMed Abstract | Crossref Full Text | Google Scholar

37. Rubinstein ND, Mayrose I, Martz E, Pupko T. Epitopia: a web-server for predicting B-cell epitopes. BMC Bioinf. (2009) 10:287. doi: 10.1186/1471-2105-10-287

Crossref Full Text | Google Scholar

38. El-Manzalawy Y, Dobbs D, Honavar V. Predicting flexible length linear B-cell epitopes. Comput Syst Bioinf Conf. (2008) 7:121–32. doi: 10.1142/p585

Crossref Full Text | Google Scholar

39. Chinnasamy A, Sung WK, Mittal A. Protein structure and fold prediction using tree-augmented naive Bayesian classifier. Pac Symp Biocomput. (2004), 387–98. doi: 10.1142/9789812704856_0037

PubMed Abstract | Crossref Full Text | Google Scholar

40. Di Rienzo L, Miotto M, Bo L, Ruocco G, Raimondo D, Milanetti E. Characterizing hydropathy of amino acid side chain in a protein environment by investigating the structural changes of water molecules network. Front Mol Biosci. (2021) 8:626837. doi: 10.3389/fmolb.2021.626837

PubMed Abstract | Crossref Full Text | Google Scholar

41. Mant CT, Kovacs JM, Kim HM, Pollock DD, Hodges RS. Intrinsic amino acid side-chain hydrophilicity/hydrophobicity coefficients determined by reversed-phase high-performance liquid chromatography of model peptides: comparison with other hydrophilicity/hydrophobicity scales. Biopolymers. (2009) 92:573–95. doi: 10.1002/bip.21316

PubMed Abstract | Crossref Full Text | Google Scholar

42. Zhao T, Cheng L, Zang T, Hu Y. Peptide-major histocompatibility complex class I binding prediction based on deep learning with novel feature. Front Genet. (2019) 10:1191. doi: 10.3389/fgene.2019.01191

PubMed Abstract | Crossref Full Text | Google Scholar

43. Eisenberg D. Three-dimensional structure of membrane and surface proteins. Annu Rev Biochem. (1984) 53:595–623. doi: 10.1146/annurev.bi.53.070184.003115

PubMed Abstract | Crossref Full Text | Google Scholar

44. Li HX, Yang JL, Zhang G, Fan B. Probabilistic support vector machines for classification of noise affected data. Inform Sci. (2013) 221:60–71. doi: 10.1016/j.ins.2012.09.041

Crossref Full Text | Google Scholar

45. Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing. (2020) 408:189–215. doi: 10.1016/j.neucom.2019.10.118

Crossref Full Text | Google Scholar

46. Zhang Y, Tino P, Leonardis A, Tang K. A survey on neural network interpretability. IEEE T Em Top Comp I. (2021) 5:726–42. doi: 10.1109/TETCI.2021.3100641

Crossref Full Text | Google Scholar

47. Costa VG, Pedreira CE. Recent advances in decision trees: an updated survey. Artif Intell Rev. (2023) 56:4765–800. doi: 10.1007/s10462-022-10275-5

Crossref Full Text | Google Scholar

48. Ofer D, Brandes N, Linial M. The language of proteins: NLP, machine learning & protein sequences. Comput Struct Biotec. (2021) 19:1750–8. doi: 10.1016/j.csbj.2021.03.022

Crossref Full Text | Google Scholar

49. Blythe MJ, Flower DR. Benchmarking B cell epitope prediction: underperformance of existing methods. Protein Sci. (2005) 14:246–8. doi: 10.1110/ps.041059505

PubMed Abstract | Crossref Full Text | Google Scholar

50. Jespersen MC, Peters B, Nielsen M, Marcatili P. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res. (2017) 45:W24–W9. doi: 10.1093/nar/gkx346

PubMed Abstract | Crossref Full Text | Google Scholar

51. Ivanisenko NV, Shashkova TI, Shevtsov A, Sindeeva M, Umerenkov D, Kardymon O. SEMA 2.0: web-platform for B-cell conformational epitopes prediction using artificial intelligence. Nucleic Acids Res. (2024) 52:W533–W9. doi: 10.1093/nar/gkae386

PubMed Abstract | Crossref Full Text | Google Scholar

52. Clifford JN, Hoie MH, Deleuran S, Peters B, Nielsen M, Marcatili P. BepiPred-3.0: Improved B-cell epitope prediction using protein language models. Protein Sci. (2022) 31:e4497. doi: 10.1002/pro.4497

PubMed Abstract | Crossref Full Text | Google Scholar

53. Koren E, De Groot AS, Jawa V, Beck KD, Boone T, Rivera D, et al. Clinical validation of the "in silico" prediction of immunogenicity of a human recombinant therapeutic protein. Clin Immunol. (2007) 124:26–32. doi: 10.1016/j.clim.2007.03.544

PubMed Abstract | Crossref Full Text | Google Scholar

54. Peters B, Nielsen M, Sette A. T cell epitope predictions. Annu Rev Immunol. (2020) 38:123–45. doi: 10.1146/annurev-immunol-082119-124838

PubMed Abstract | Crossref Full Text | Google Scholar

55. Hotop SK, Reimering S, Shekhar A, Asgari E, Beutling U, Dahlke C, et al. Peptide microarrays coupled to machine learning reveal individual epitopes from human antibody responses with neutralizing capabilities against SARS-CoV-2. Emerg Microbes Infect. (2022) 11:1037–48. doi: 10.1080/22221751.2022.2057874

PubMed Abstract | Crossref Full Text | Google Scholar

56. Hada-Neeman S, Weiss-Ottolenghi Y, Wagner N, Avram O, Ashkenazy H, Maor Y, et al. Domain-scan: combinatorial sero-diagnosis of infectious diseases using machine learning. Front Immunol. (2020) 11:619896. doi: 10.3389/fimmu.2020.619896

PubMed Abstract | Crossref Full Text | Google Scholar

57. Maguy A, Tardif JC, Busseuil D, Ribi C, Li J. Autoantibody signature in cardiac arrest. Circulation. (2020) 141:1764–74. doi: 10.1161/CIRCULATIONAHA.119.044408

PubMed Abstract | Crossref Full Text | Google Scholar

58. Xue AY, Szymczak LC, Mrksich M, Bagheri N. Machine learning on signal-to-noise ratios improves peptide array design in SAMDI mass spectrometry. Anal Chem. (2017) 89:9039–47. doi: 10.1021/acs.analchem.7b01728

PubMed Abstract | Crossref Full Text | Google Scholar

59. Krawczyk K, Baker T, Shi J, Deane CM. Antibody i-Patch prediction of the antibody binding site improves rigid local antibody-antigen docking. Protein Eng Des Sel. (2013) 26:621–9. doi: 10.1093/protein/gzt043

PubMed Abstract | Crossref Full Text | Google Scholar

60. Weitzner BD, Jeliazkov JR, Lyskov S, Marze N, Kuroda D, Frick R, et al. Modeling and docking of antibody structures with Rosetta. Nat Protoc. (2017) 12:401–16. doi: 10.1038/nprot.2016.180

PubMed Abstract | Crossref Full Text | Google Scholar

61. Garzon JI, Lopez-Blanco JR, Pons C, Kovacs J, Abagyan R, Fernandez-Recio J, et al. FRODOCK: a new approach for fast rotational protein-protein docking. Bioinformatics. (2009) 25:2544–51. doi: 10.1093/bioinformatics/btp447

PubMed Abstract | Crossref Full Text | Google Scholar

62. Bourquard T, Musnier A, Puard V, Tahir S, Ayoub MA, Jullian Y, et al. MAbTope: A method for improved epitope mapping. J Immunol. (2018) 201:3096–105. doi: 10.4049/jimmunol.1701722

PubMed Abstract | Crossref Full Text | Google Scholar

63. Yang C, Chen EA, Zhang Y. Protein-ligand docking in the machine-learning era. Molecules. (2022) 27. doi: 10.3390/molecules27144568

Crossref Full Text | Google Scholar

64. Wang HL, Wang MJ, Tan H, Li Y, Zhang ZD, Song JN. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection. PLoS One. (2014) 9. doi: 10.1371/journal.pone.0105902

Crossref Full Text | Google Scholar

65. Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A. XtalPred: a web server for prediction of protein crystallizability. Bioinformatics. (2007) 23:3403–5. doi: 10.1093/bioinformatics/btm477

PubMed Abstract | Crossref Full Text | Google Scholar

66. Tubiana J, Xiang Y, Fan L, Wolfson HJ, Chen K, Schneidman-Duhovny D, et al. Reduced B cell antigenicity of Omicron lowers host serologic response. Cell Rep. (2022) 41:111512. doi: 10.1016/j.celrep.2022.111512

PubMed Abstract | Crossref Full Text | Google Scholar

67. Jain D, Salunke DM. Antibody specificity and promiscuity. Biochem J. (2019) 476:433–47. doi: 10.1042/BCJ20180670

PubMed Abstract | Crossref Full Text | Google Scholar

68. Gunti S, Notkins AL. Polyreactive antibodies: function and quantification. J Infect Dis. (2015) 212 Suppl 1:S42–6. doi: 10.1093/infdis/jiu512

PubMed Abstract | Crossref Full Text | Google Scholar

69. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. (2021) 596:583–9. doi: 10.1038/s41586-021-03819-2

PubMed Abstract | Crossref Full Text | Google Scholar

70. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. (2024) 630, 493–500. doi: 10.1038/s41586-024-07487-w

PubMed Abstract | Crossref Full Text | Google Scholar

71. Liu J, Guo Z, Wu T, Roy RS, Quadir F, Chen C, et al. Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15. Commun Biol. (2023) 6:1140. doi: 10.1038/s42003-023-05525-3

PubMed Abstract | Crossref Full Text | Google Scholar

72. Deac A, VeliCkovic P, Sormanni P. Attentive cross-modal paratope prediction. J Comput Biol. (2019) 26:536–45. doi: 10.1089/cmb.2018.0175

PubMed Abstract | Crossref Full Text | Google Scholar

73. Chinery L, Wahome N, Moal I, Deane CM. Paragraph-antibody paratope prediction using graph neural networks with minimal feature vectors. Bioinformatics. (2023) 39. doi: 10.1093/bioinformatics/btac732

PubMed Abstract | Crossref Full Text | Google Scholar

74. Liberis E, Velickovic P, Sormanni P, Vendruscolo M, Lio P. Parapred: antibody paratope prediction using convolutional and recurrent neural networks. Bioinformatics. (2018) 34:2944–50. doi: 10.1093/bioinformatics/bty305

PubMed Abstract | Crossref Full Text | Google Scholar

75. Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. (2022) 50:D439–D44. doi: 10.1093/nar/gkab1061

PubMed Abstract | Crossref Full Text | Google Scholar

76. Mirdita M, Schutze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. (2022) 19:679–82. doi: 10.1038/s41592-022-01488-1

PubMed Abstract | Crossref Full Text | Google Scholar

77. Nguyen MN, Krutz NL, Limviphuvadh V, Lopata AL, Gerberick GF, Maurer-Stroh S. AllerCatPro 2.0: a web server for predicting protein allergenicity potential. Nucleic Acids Res. (2022) 50:W36–43. doi: 10.1093/nar/gkac446

PubMed Abstract | Crossref Full Text | Google Scholar

78. Ferrante G, Licari A, Fasola S, Marseglia GL, La Grutta S. Artificial intelligence in the diagnosis of pediatric allergic diseases. Pediatr Allergy Immunol. (2021) 32:405–13. doi: 10.1111/pai.13419

PubMed Abstract | Crossref Full Text | Google Scholar

79. Patterson BK, Guevara-Coto J, Yogendra R, Francisco EB, Long E, Pise A, et al. Immune-based prediction of COVID-19 severity and chronicity decoded using machine learning. Front Immunol. (2021) 12:700782. doi: 10.3389/fimmu.2021.700782

PubMed Abstract | Crossref Full Text | Google Scholar

80. Stafford IS, Kellermann M, Mossotto E, Beattie RM, MacArthur BD, Ennis S. A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases. NPJ Digit Med. (2020) 3:30. doi: 10.1038/s41746-020-0229-3

PubMed Abstract | Crossref Full Text | Google Scholar

81. Mendes M, Mahita J, Blazeska N, Greenbaum J, Ha B, Wheeler K, et al. IEDB-3D 2.0: Structural data analysis within the Immune Epitope Database. Protein Sci. (2023) 32:e4605. doi: 10.1002/pro.4605

PubMed Abstract | Crossref Full Text | Google Scholar

82. Mamrosh E-SD-F JL, Korber BTM, Brander C, Barouch D, de Boer R, Haynes BF, et al. HIV Molecular Immunology. 2024 ed. Los Alamos, New Mexico: Los Alamos National Laboratory, Theoretical Biology and Biophysics (2023).

Google Scholar

83. Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD, Cantrell JR, et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. (2015) 43:D405–12. doi: 10.1093/nar/gku938

PubMed Abstract | Crossref Full Text | Google Scholar

84. Olsen LR, Tongchusak S, Lin H, Reinherz EL, Brusic V, Zhang GL. TANTIGEN: a comprehensive database of tumor T cell antigens. Cancer Immunol Immunother. (2017) 66:731–5. doi: 10.1007/s00262-017-1978-y

PubMed Abstract | Crossref Full Text | Google Scholar

85. Yang B, Sayers S, Xiang Z, He Y. Protegen: a web-based protective antigen database and analysis system. Nucleic Acids Res. (2011) 39:D1073–8. doi: 10.1093/nar/gkq944

PubMed Abstract | Crossref Full Text | Google Scholar

86. Olsen LR, Zhang GL, Reinherz EL, Brusic V. FLAVIdB: A data mining system for knowledge discovery in flaviviruses with direct applications in immunology and vaccinology. Immunome Res. (2011) 7.

PubMed Abstract | Google Scholar

87. Lata S, Bhasin M, Raghava GP. MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes. BMC Res Notes. (2009) 2:61. doi: 10.1186/1756-0500-2-61

PubMed Abstract | Crossref Full Text | Google Scholar

88. Reche PA, Zhang H, Glutting JP, Reinherz EL. EPIMHC: a curated database of MHC-binding peptides for customized computational vaccinology. Bioinformatics. (2005) 21:2140–1. doi: 10.1093/bioinformatics/bti269

PubMed Abstract | Crossref Full Text | Google Scholar

89. Schuler MM, Nastke MD, Stevanovikc S. SYFPEITHI: database for searching and T-cell epitope prediction. Methods Mol Biol. (2007) 409:75–93. doi: 10.1007/978-1-60327-118-9_5

PubMed Abstract | Crossref Full Text | Google Scholar

90. Negi SS, Schein CH, Braun W. The updated Structural Database of Allergenic Proteins (SDAP 2.0) provides 3D models for allergens and incorporated bioinformatics tools. J Allergy Clin Immunol Glob. (2023) 2:100162. doi: 10.1016/j.jacig.2023.100162

PubMed Abstract | Crossref Full Text | Google Scholar

91. Huang J, Honda W. CED: a conformational epitope database. BMC Immunol. (2006) 7:7. doi: 10.1186/1471-2172-7-7

PubMed Abstract | Crossref Full Text | Google Scholar

92. Schlessinger A, Ofran Y, Yachdav G, Rost B. Epitome: database of structure-inferred antigenic epitopes. Nucleic Acids Res. (2006) 34:D777–80. doi: 10.1093/nar/gkj053

PubMed Abstract | Crossref Full Text | Google Scholar

93. Blythe MJ, Doytchinova IA, Flower DR. JenPep: a database of quantitative functional peptide data for immunology. Bioinformatics. (2002) 18:434–9. doi: 10.1093/bioinformatics/18.3.434

PubMed Abstract | Crossref Full Text | Google Scholar

94. Sweredoski MJ, Baldi P. COBEpro: a novel system for predicting continuous B-cell epitopes. Protein Eng Des Sel. (2009) 22:113–20. doi: 10.1093/protein/gzn075

PubMed Abstract | Crossref Full Text | Google Scholar

95. Sun J, Wu D, Xu T, Wang X, Xu X, Tao L, et al. SEPPA: a computational server for spatial epitope prediction of protein antigens. Nucleic Acids Res. (2009) 37:W612–6. doi: 10.1093/nar/gkp417

PubMed Abstract | Crossref Full Text | Google Scholar

96. Qi T, Qiu T, Zhang Q, Tang K, Fan Y, Qiu J, et al. SEPPA 2.0–more refined server to predict spatial epitope considering species of immune host and subcellular localization of protein antigen. Nucleic Acids Res. (2014) 42:W59–63. doi: 10.1093/nar/gku395

PubMed Abstract | Crossref Full Text | Google Scholar

97. Zhou C, Chen Z, Zhang L, Yan D, Mao T, Tang K, et al. SEPPA 3.0-enhanced spatial epitope prediction enabling glycoprotein antigens. Nucleic Acids Res. (2019) 47:W388–W94. doi: 10.1093/nar/gkz413

PubMed Abstract | Crossref Full Text | Google Scholar

98. Hoie MH, Gade FS, Johansen JM, Wurtzen C, Winther O, Nielsen M, et al. DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Front Immunol. (2024) 15:1322712. doi: 10.3389/fimmu.2024.1322712

PubMed Abstract | Crossref Full Text | Google Scholar

99. da Silva BM, Myung Y, Ascher DB, Pires DEV. epitope3D: a machine learning method for conformational B-cell epitope prediction. Brief Bioinform. (2022) 23:bbab4235. doi: 10.1093/bib/bbab423

PubMed Abstract | Crossref Full Text | Google Scholar

100. Krawczyk K, Liu X, Baker T, Shi J, Deane CM. Improving B-cell epitope prediction and its application to global antibody-antigen docking. Bioinformatics. (2014) 30:2288–94. doi: 10.1093/bioinformatics/btu190

PubMed Abstract | Crossref Full Text | Google Scholar

101. Ponomarenko J, Bui HH, Li W, Fusseder N, Bourne PE, Sette A, et al. ElliPro: a new structure-based tool for the prediction of antibody epitopes. BMC Bioinf. (2008) 9:514. doi: 10.1186/1471-2105-9-514

Crossref Full Text | Google Scholar

102. Liu F, Yuan C, Chen H, Yang F. Prediction of linear B-cell epitopes based on protein sequence features and BERT embeddings. Sci Rep. (2024) 14:2464. doi: 10.1038/s41598-024-53028-w

PubMed Abstract | Crossref Full Text | Google Scholar

103. Kumar N, Tripathi S, Sharma N, Patiyal S, Devi NL, Raghava GPS. A method for predicting linear and conformational B-cell epitopes in an antigen from its primary sequence. Comput Biol Med. (2024) 170:108083. doi: 10.1016/j.compbiomed.2024.108083

PubMed Abstract | Crossref Full Text | Google Scholar

104. da Silva BM, Ascher DB, Pires DEV. epitope1D: accurate taxonomy-aware B-cell linear epitope prediction. Brief Bioinform. (2023) 24:bbad114. doi: 10.1093/bib/bbad114

PubMed Abstract | Crossref Full Text | Google Scholar

105. Ras-Carmona A, Pelaez-Prestel HF, Lafuente EM, Reche PA. BCEPS: A web server to predict linear B cell epitopes with enhanced immunogenicity and cross-reactivity. Cells. (2021) 10. doi: 10.3390/cells10102744

PubMed Abstract | Crossref Full Text | Google Scholar

106. Collatz M, Mock F, Barth E, Holzer M, Sachse K, Marz M. EpiDope: a deep neural network for linear B-cell epitope prediction. Bioinformatics. (2021) 37:448–55. doi: 10.1093/bioinformatics/btaa773

PubMed Abstract | Crossref Full Text | Google Scholar

107. Liu T, Shi K, Li W. Deep learning methods improve linear B-cell epitope prediction. BioData Min. (2020) 13:1. doi: 10.1186/s13040-020-00211-0

PubMed Abstract | Crossref Full Text | Google Scholar

108. Ya.I. Davydov AGT. Prediction of linear B-cell epitopes. Molekulyarnaya Biologiya. (2009) 43:166–74. doi: 10.1134/S0026893309010208

Crossref Full Text | Google Scholar

109. Yao B, Zhang L, Liang S, Zhang C. SVMTriP: a method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity. PloS One. (2012) 7:e45152. doi: 10.1371/journal.pone.0045152

PubMed Abstract | Crossref Full Text | Google Scholar

110. El-Manzalawy Y, Dobbs D, Honavar V. Predicting linear B-cell epitopes using string kernels. J Mol Recognit. (2008) 21:243–55. doi: 10.1002/jmr.893

PubMed Abstract | Crossref Full Text | Google Scholar

111. Saha S, Raghava GP. (2004). ‘BCEPRED: Prediction of continuous B-cell epitopes in antigenic sequences using physico-chemical properties’, Lecture Notes in Computer Science, vol 3239. Berlin, Heidelberg: Springer. pp. 197–204. doi: 10.1007/978-3-540-30220-9_16

Crossref Full Text | Google Scholar

112. Israeli S, Louzoun Y. Single-residue linear and conformational B cell epitopes prediction using random and ESM-2 based projections. Brief Bioinform. (2024) 25:bbae084. doi: 10.1093/bib/bbae084

PubMed Abstract | Crossref Full Text | Google Scholar

113. Chen J, Zhao B, Lin S, Sun H, Mao X, Wang M, et al. TEPCAM: Prediction of T-cell receptor-epitope binding specificity via interpretable deep learning. Protein Sci. (2024) 33:e4841. doi: 10.1002/pro.v33.1

PubMed Abstract | Crossref Full Text | Google Scholar

114. Gao YC, Gao YL, Fan YX, Zhu CY, Wei ZT, Zhou C, et al. Pan-Peptide Meta Learning for T-cell receptor-antigen binding recognition. Nat Mach Intell. (2023) 5:236–49. doi: 10.1038/s42256-023-00619-3

Crossref Full Text | Google Scholar

115. Jiang Y, Huo M, Cheng Li S. TEINet: a deep learning framework for prediction of TCR-epitope binding specificity. Brief Bioinform. (2023) 24:bbad086. doi: 10.1093/bib/bbad086

PubMed Abstract | Crossref Full Text | Google Scholar

116. Cai M, Bang S, Zhang P, Lee H. ATM-TCR: TCR-epitope binding affinity prediction using a multi-head self-attention model. Front Immunol. (2022) 13:893247. doi: 10.3389/fimmu.2022.893247

PubMed Abstract | Crossref Full Text | Google Scholar

117. Weber A, Born J, Rodriguez Martinez M. TITAN: T-cell receptor specificity prediction with bimodal attention networks. Bioinformatics. (2021) 37:i237–i44. doi: 10.1093/bioinformatics/btab294

PubMed Abstract | Crossref Full Text | Google Scholar

118. Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. (2020) 48:W449–W54. doi: 10.1093/nar/gkaa379

PubMed Abstract | Crossref Full Text | Google Scholar

119. O'Donnell TJ, Rubinsteyn A, Laserson U. MHCflurry 2.0: improved pan-allele prediction of MHC class I-presented peptides by incorporating antigen processing. Cell Syst. (2020) 11:418–9. doi: 10.1016/j.cels.2020.09.001

PubMed Abstract | Crossref Full Text | Google Scholar

120. Hu Y, Wang Z, Hu H, Wan F, Chen L, Xiong Y, et al. ACME: pan-specific peptide-MHC class I binding prediction through attention-based deep neural networks. Bioinformatics. (2019) 35:4946–54. doi: 10.1093/bioinformatics/btz427

PubMed Abstract | Crossref Full Text | Google Scholar

121. Molero-Abraham M, Lafuente EM, Flower DR, Reche PA. Selection of conserved epitopes from hepatitis C virus for pan-populational stimulation of T-cell responses. Clin Dev Immunol. (2013) 2013:601943. doi: 10.1155/2013/601943

PubMed Abstract | Crossref Full Text | Google Scholar

122. Zhang H, Lund O, Nielsen M. The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding. Bioinformatics. (2009) 25:1293–9. doi: 10.1093/bioinformatics/btp137

PubMed Abstract | Crossref Full Text | Google Scholar

123. Stranzl T, Larsen MV, Lundegaard C, Nielsen M. NetCTLpan: pan-specific MHC class I pathway epitope predictions. Immunogenetics. (2010) 62:357–68. doi: 10.1007/s00251-010-0441-4

PubMed Abstract | Crossref Full Text | Google Scholar

124. Doytchinova IA, Guan P, Flower DR. EpiJen: a server for multistep T cell epitope prediction. BMC Bioinf. (2006) 7:131. doi: 10.1186/1471-2105-7-131

Crossref Full Text | Google Scholar

125. Reche PA, Reinherz EL. PEPVAC: a web server for multi-epitope vaccine development based on the prediction of supertypic MHC ligands. Nucleic Acids Res. (2005) 33:W138–42. doi: 10.1093/nar/gki357

PubMed Abstract | Crossref Full Text | Google Scholar

126. Racle J, Guillaume P, Schmidt J, Michaux J, Larabi A, Lau K, et al. Machine learning predictions of MHC-II specificities reveal alternative binding mode of class II epitopes. Immunity. (2023) 56:1359–75.e13. doi: 10.1016/j.immuni.2023.03.009

PubMed Abstract | Crossref Full Text | Google Scholar

127. Chen B, Khodadoust MS, Olsson N, Wagar LE, Fast E, Liu CL, et al. Predicting HLA class II antigen presentation through integrated deep learning. Nat Biotechnol. (2019) 37:1332–43. doi: 10.1038/s41587-019-0280-2

PubMed Abstract | Crossref Full Text | Google Scholar

128. Abelin JG, Harjanto D, Malloy M, Suri P, Colson T, Goulding SP, et al. Defining HLA-II ligand processing and binding rules with mass spectrometry enhances cancer epitope prediction. Immunity. (2019) 51:766–79.e17. doi: 10.1016/j.immuni.2019.08.012

PubMed Abstract | Crossref Full Text | Google Scholar

129. Nielsen M, Andreatta M. NNAlign: a platform to construct and evaluate artificial neural network models of receptor-ligand interactions. Nucleic Acids Res. (2017) 45:W344–W9. doi: 10.1093/nar/gkx276

PubMed Abstract | Crossref Full Text | Google Scholar

130. Atanasova M, Patronov A, Dimitrov I, Flower DR, Doytchinova I. EpiDOCK: a molecular docking-based tool for MHC class II binding prediction. Protein Eng Des Sel. (2013) 26:631–4. doi: 10.1093/protein/gzt018

PubMed Abstract | Crossref Full Text | Google Scholar

131. Karnaukhov VK, Shcherbinin DS, Chugunov AO, Chudakov DM, Efremov RG, Zvyagin IV, et al. Structure-based prediction of T cell receptor recognition of unseen epitopes using TCRen. Nat Comput Sci. (2024) 4:510–21. doi: 10.1038/s43588-024-00653-0

PubMed Abstract | Crossref Full Text | Google Scholar

132. Springer I, Tickotsky N, Louzoun Y. Contribution of T cell receptor alpha and beta CDR3, MHC typing, V and J genes to peptide binding prediction. Front Immunol. (2021) 12:664514. doi: 10.3389/fimmu.2021.664514

PubMed Abstract | Crossref Full Text | Google Scholar

133. Ong E, Cooke MF, Huffman A, Xiang Z, Wong MU, Wang H, et al. Vaxign2: the second generation of the first Web-based vaccine design program using reverse vaccinology and machine learning. Nucleic Acids Res. (2021) 49:W671–W8. doi: 10.1093/nar/gkab279

PubMed Abstract | Crossref Full Text | Google Scholar

134. Reche PA, Glutting JP, Zhang H, Reinherz EL. Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics. (2004) 56:405–19. doi: 10.1007/s00251-004-0709-7

PubMed Abstract | Crossref Full Text | Google Scholar

135. Taguchi AT, Boyd J, Diehnelt CW, Legutki JB, Zhao ZG, Woodbury NW. Comprehensive prediction of molecular recognition in a combinatorial chemical space using machine learning. ACS Comb Sci. (2020) 22:500–8. doi: 10.1021/acscombsci.0c00003

PubMed Abstract | Crossref Full Text | Google Scholar

136. Renard BY, Lower M, Kuhne Y, Reimer U, Rothermel A, Tureci O, et al. rapmad: Robust analysis of peptide microarray data. BMC Bioinf. (2011) 12:324. doi: 10.1186/1471-2105-12-324

Crossref Full Text | Google Scholar

137. Ogura T, Sato C. An automatic particle pickup method using a neural network applicable to low-contrast electron micrographs. J Struct Biol. (2001) 136:227–38. doi: 10.1006/jsbi.2002.4442

PubMed Abstract | Crossref Full Text | Google Scholar

138. Ogura T, Sato C. Automatic particle pickup method using a neural network has high accuracy by applying an initial weight derived from eigenimages: a new reference free method for single-particle analysis. J Struct Biol. (2004) 145:63–75. doi: 10.1016/S1047-8477(03)00139-4

PubMed Abstract | Crossref Full Text | Google Scholar

139. Lei H, Yang Y. CDAE: A cascade of denoising autoencoders for noise reduction in the clustering of single-particle cryo-EM images. Front Genet. (2020) 11:627746. doi: 10.3389/fgene.2020.627746

PubMed Abstract | Crossref Full Text | Google Scholar

140. Sanchez-Garcia R, Segura J, Maluenda D, Carazo JM, Sorzano COS. Deep Consensus, a deep learning-based approach for particle pruning in cryo-electron microscopy. IUCrJ. (2018) 5:854–65. doi: 10.1107/S2052252518014392

PubMed Abstract | Crossref Full Text | Google Scholar

141. Tegunov D, Cramer P. Real-time cryo-electron microscopy data preprocessing with Warp. Nat Methods. (2019) 16:1146–52. doi: 10.1038/s41592-019-0580-y

PubMed Abstract | Crossref Full Text | Google Scholar

142. Yokoyama Y, Terada T, Shimizu K, Nishikawa K, Kozai D, Shimada A, et al. Development of a deep learning-based method to identify "good" regions of a cryo-electron microscopy grid. Biophys Rev. (2020) 12:349–54. doi: 10.1007/s12551-020-00669-6

PubMed Abstract | Crossref Full Text | Google Scholar

143. Avramov TK, Vyenielo D, Gomez-Blanco J, Adinarayanan S, Vargas J, Si D. Deep learning for validating and estimating resolution of cryo-electron microscopy density maps (dagger). Molecules. (2019) 24. doi: 10.3390/molecules24061181

PubMed Abstract | Crossref Full Text | Google Scholar

144. Pfab J, Phan NM, Si D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc Natl Acad Sci U.S.A. (2021) 118. doi: 10.1073/pnas.2017525118

PubMed Abstract | Crossref Full Text | Google Scholar

145. Si D, Moritz SA, Pfab J, Hou J, Cao R, Wang L, et al. Deep learning to predict protein backbone structure from high-resolution cryo-EM density maps. Sci Rep. (2020) 10:4282. doi: 10.1038/s41598-020-60598-y

PubMed Abstract | Crossref Full Text | Google Scholar

146. Hryc CF, Baker ML. AlphaFold2 and CryoEM: Revisiting CryoEM modeling in near-atomic resolution density maps. iScience. (2022) 25:104496. doi: 10.1016/j.isci.2022.104496

PubMed Abstract | Crossref Full Text | Google Scholar

147. Zhu YH, Hu J, Ge F, Li F, Song J, Zhang Y, et al. Accurate multistage prediction of protein crystallization propensity using deep-cascade forest with sequence-based features. Brief Bioinform. (2021) 22:bbaa076. doi: 10.1093/bib/bbaa076

PubMed Abstract | Crossref Full Text | Google Scholar

148. Wang PH, Zhu YH, Yang X, Yu DJ. GCmapCrys: Integrating graph attention network with predicted contact map for multi-stage protein crystallization propensity prediction. Anal Biochem. (2023) 663:115020. doi: 10.1016/j.ab.2022.115020

PubMed Abstract | Crossref Full Text | Google Scholar

149. Bruno AE, Charbonneau P, Newman J, Snell EH, So DR, Vanhoucke V, et al. Classification of crystallization outcomes using deep convolutional neural networks. PLoS One. (2018) 13:e0198883. doi: 10.1371/journal.pone.0198883

PubMed Abstract | Crossref Full Text | Google Scholar

150. Bond PS, Wilson KS, Cowtan KD. Predicting protein model correctness in Coot using machine learning. Acta Crystallogr D Struct Biol. (2020) 76:713–23. doi: 10.1107/S2059798320009080

PubMed Abstract | Crossref Full Text | Google Scholar

151. Khurana S, Rawi R, Kunji K, Chuang GY, Bensmail H, Mall R. DeepSol: a deep learning framework for sequence-based protein solubility prediction. Bioinformatics. (2018) 34:2605–13. doi: 10.1093/bioinformatics/bty166

PubMed Abstract | Crossref Full Text | Google Scholar

152. Rahmani V, Nawaz S, Pennicard D, Setty SPR, Graafsma H. Data reduction for X-ray serial crystallography using machine learning. J Appl Crystallogr. (2023) 56:200–13. doi: 10.1107/S1600576722011748

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: machine learning, epitope, B-cell, algorithm, features, databases, toolboxes, vaccine

Citation: Grewal S, Hegde N and Yanow SK (2024) Integrating machine learning to advance epitope mapping. Front. Immunol. 15:1463931. doi: 10.3389/fimmu.2024.1463931

Received: 12 July 2024; Accepted: 09 September 2024;
Published: 30 September 2024.

Edited by:

Anastas Dimitrov Pashov, Bulgarian Academy of Sciences (BAS), Bulgaria

Reviewed by:

Anjali Dhall, National Cancer Institute (NIH), United States
Minh Nguyen, Bioinformatics Institute (A*STAR), Singapore

Copyright © 2024 Grewal, Hegde and Yanow. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Stephanie K. Yanow, eWFub3dAdWFsYmVydGEuY2E=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.