Skip to main content

ORIGINAL RESEARCH article

Front. Immunol.
Sec. T Cell Biology
Volume 15 - 2024 | doi: 10.3389/fimmu.2024.1426173
This article is part of the Research Topic Harnessing TCR - Peptide/MHC binding Specificity to Tackle Disease View all 8 articles

TCR-H: Explainable Machine Learning Prediction of T-cell Receptor Epitope Binding on Unseen Datasets

Provisionally accepted
  • 1 Oak Ridge National Laboratory (DOE), Oak Ridge, United States
  • 2 The University of Tennessee, Knoxville, Knoxville, Tennessee, United States

The final, formatted version of the article will be published soon.

    Artificial-intelligence and machine-learning (AI/ML) approaches to predicting T-cell receptor (TCR)-epitope specificity achieve high performance metrics on test datasets which include sequences that are also part of the training set but fail to generalize to test sets consisting of epitopes and TCRs that are absent from the training set, i.e., are 'unseen' during training of the ML model. We present TCR-H, a supervised classification Support Vector Machines model using physicochemical features trained on the largest dataset available to date using only experimentally validated non-binders as negative datapoints. TCR-H exhibits an area under the curve of the receiver-operator characteristic (AUC of ROC) of 0.87 for epitope 'hard splitting' (i.e., on test sets with all epitopes unseen during ML training), 0.92 for TCR hard splitting and 0.89 for 'strict splitting' in which neither the epitopes nor the TCRs in the test set are seen in the training data. Furthermore, we employ the SHAP (Shapley additive explanations) eXplainable AI (XAI) method for post hoc interrogation to interpret the models trained with different hard splits, shedding light on the key physiochemical features driving model predictions. TCR-H thus represents a significant step towards general applicability and explainability of epitope:TCR specificity prediction.

    Keywords: T-cell receptor, epitope, antigen, Explainable Machine Learning, Physicochemical model, adaptive immunity T-cell Receptor, machine learning, physicochemical features

    Received: 30 Apr 2024; Accepted: 29 Jul 2024.

    Copyright: © 2024 Tatikonda, Demerdash and Smith. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Omar Demerdash, Oak Ridge National Laboratory (DOE), Oak Ridge, United States
    Jeremy C. Smith, Oak Ridge National Laboratory (DOE), Oak Ridge, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.