Is the exquisite specificity of lymphocytes generated by thymic selection or due to evolution?

De Boer, Rob J.; Kesmir, Can; Perelson, Alan S.; Borghans, José A. M.

doi:10.3389/fimmu.2024.1266349

ORIGINAL RESEARCH article

Front. Immunol., 25 March 2024

Sec. Comparative Immunology

Volume 15 - 2024 | https://doi.org/10.3389/fimmu.2024.1266349

This article is part of the Research TopicEvolutionary Trade-Offs in Adaptive ImmunityView all 12 articles

Is the exquisite specificity of lymphocytes generated by thymic selection or due to evolution?

¹Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
²Department of Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, United States
³Center for Translational Immunology, University Medical Center Utrecht, Utrecht, Netherlands

We have previously argued that the antigen receptors of T and B lymphocytes evolved to be sufficiently specific to avoid massive deletion of clonotypes by negative selection. Their optimal ‘specificity’ level, i.e., probability of binding any particular epitope, was shown to be inversely related to the number of self-antigens that the cells have to be tolerant to. Experiments have demonstrated that T lymphocytes also become more specific during negative selection in the thymus, because cells expressing the most crossreactive receptors have the highest likelihood of binding a self-antigen, and hence to be tolerized (i.e., deleted, anergized, or diverted into a regulatory T cell phenotype). Thus, there are two —not mutually exclusive— explanations for the exquisite specificity of T cells, one involving evolution and the other thymic selection. To better understand the impact of both, we extend a previously developed mathematical model by allowing for T cells with very different binding probabilities in the pre-selection repertoire. We confirm that negative selection tends to tolerize the most crossreactive clonotypes. As a result, the average level of specificity in the functional post-selection repertoire depends on the number of self-antigens, even if there is no evolutionary optimization of binding probabilities. However, the evolutionary optimal range of binding probabilities in the pre-selection repertoire also depends on the number of self-antigens. Species with more self antigens need more specific pre-selection repertoires to avoid excessive loss of T cells during thymic selection, and hence mount protective immune responses. We conclude that both evolution and negative selection are responsible for the high level of specificity of lymphocytes.

1 Introduction

The repertoires of B- and T-lymphocytes in the adaptive immune system are extremely diverse. The diversity of T-cell receptors (TCRs) in the circulating pools of naive CD4⁺ and CD8⁺ T cells in human adults has been estimated to be more than 10⁹ unique αβ-TCRs (1). Repertoires need to be diverse because the antigen receptors expressed by lymphocytes are very specific. For instance, the precursor frequency for a typical viral epitope is about one cell in 10⁵ to 10⁶ naive CD8⁺ T cells (2–7). A repertoire therefore needs to contain many more than 10⁵ unique antigen receptors to be complete, i.e., to be expected to mount an immune response to any foreign antigen (8, 9). To avoid autoimmunity, lymphocyte receptors binding self-antigens should be absent from the circulating repertoire of functional naive T cells (or have adopted an unresponsive phenotype). For the peptides of nine amino acids (9-mers) that are used as epitopes by CD8⁺ T cells, we estimated that there are about 10⁷ unique self-epitopes in the human proteome, of which about 10⁵ are expected to be presentable on a particular HLA molecule (10) as a unique peptide-MHC (pMHC). Thus, any naive CD8⁺ T cell faces the problem of having to respond to about one in 10⁵ to 10⁶ foreign pMHCs, while not binding any of the about 10⁵ self-pMHC presented on the MHC molecules it is restricted to.

Although low precursor frequencies confirm that lymphocytes tend to be very specific, i.e., have a low probability to bind a randomly chosen antigen, it is also well-known that a typical TCR can bind many different peptides, which even do not need to be similar (11). Wooldridge et al. showed that one particular TCR (i.e., 1E6 binding an A*0201-restricted 10-mer) was able to bind over a million different peptides with sufficient affinity (12). TCRs are therefore said to be broadly specific, cross-reactive, degenerate, and promiscuous (13). Instead, the ‘exquisite specificity’ that we are studying here is defined as the probability a TCR binds a randomly chosen pMHC. Thus, highly-specific TCRs have a low binding probability, p. Most authors agree that a high level of TCR specificity is perfectly compatible with the ability of a TCR to bind many pMHC, simply because there are so many different pMHC (12–14). For example, there are $20^{10} ≃ 10^{13}$ different 10-mers, and binding more than 10⁶ of them (12), would still be compatible with a low binding probability of p < 10⁻⁶, which is a normal precursor frequency. Since TCRs may differ widely in their levels of specificity, we extend previous models that were based upon a single binding probability.

The level of specificity at which a post-selection lymphocyte repertoire best responds to a foreign antigen was determined by analyzing simple mathematical models combining the probability of survival from negative selection with the probability to respond to a foreign epitope (14–16). The optimal binding probability was first shown to be inversely related to the number of self-epitopes the lymphocytes have to be tolerant to (15), and after allowing for incomplete tolerance this optimum was later confirmed to be an upper bound (14, 16). Thus, the typical precursor frequency of 1 in 10⁵ or 10⁶ clonotypes (i.e., cells expressing the same antigen receptors) (2–7) was thought to reflect an evolutionary adaptation of the lymphocyte specificity to not respond to about 10⁵ self-epitopes. In this work it was implicitly assumed that lymphocytes tend to have the same probability of binding pMHC, i.e., the same coverage of shape space (9); see Figures 1A, B.

Figure 1

Figure 1 A cartoon of pre-selection (A, C) and post-selection (B, D) repertoires in a shape space representation. Clonotypes are depicted as orange circles representing the area in shape space that they cover. Self-epitopes are depicted as black dots. All clonotypes that cover at least one self-epitope have been deleted in the post-selection repertoires of (B, D). In the previous model (A, B) all clonotypes have the same binding probability, p, making all circles equally large, whereas in the extended model (C, D) clonotypes differ in the degree of specificity, which is visualized as the size of their circles. In the extended model negative selection automatically selects for more specific clonotypes covering a smaller fraction of the shape space (D).

Alternatively, experiments have suggested that the level of specificity of T-cell receptors in the postselection repertoire depends on the number of presented self-epitopes in the thymus (17–19). T cells obtained from mice expressing a single self-epitope in the thymus were found to be much more crossreactive than T cells obtained from normal mice (18). This suggests that T cells entering negative selection in the thymus have antigen receptors that differ markedly in their degree of specificity, and that clonotypes expressing crossreactive receptors are more likely to be removed from the repertoire when there are many self-epitopes (see Figures 1C, D). As a consequence, the level of specificity of the post-selection repertoire should be inversely related to the number of self-epitopes, and one would not need to invoke evolutionary optimization to explain the quantitative agreement between the typical number of self-antigens and the typical T-cell precursor frequency.

Mathematical models representing self-pMHC and T-cell receptors as strings of digits or amino acids have confirmed that negative selection makes the post-selection repertoire more specific (20, 21). Chao et al. (20), using differences between digits to define antigenic distance, were the first to confirm that negative selection is expected to decrease the average crossreactivity of the post-selection repertoire. These results were subsequently extended by Kosmrlj et al. (21), who defined self-pMHC and TCRs as strings of amino acids, and explicitly considered differences between strongly interacting and weakly interacting amino acids (22). This allowed them to predict that the T cells surviving negative selection should be enriched in weakly interacting amino acids (21). This prediction was recently confirmed by studies comparing the amino acid frequencies in the CDR3 regions of conventional (Tconv) and regulatory (Treg) CD4⁺ T cells (23, 24). Tregs are CD4⁺ T cells that have adopted a tolerized fate, e.g., after binding self-antigen(s) in the thymus, and can down-regulate immune responses (‘functional’ naive CD4⁺ T cells are conventional, i.e., Tconv cells). Stadinski et al. (23) showed that the presence of —the more interactive— hydrophobic amino acids in the middle of the CDR3 region predisposes cells to a Treg phenotype. Lagattuta et al. (24) showed that the more ‘sticky’ hydrophobic amino acids are enriched in Treg cells, while negatively charged amino acids are enriched in Tconv cells. Hydrophobic amino acids are also enriched in the relatively crossreactive T-cell receptors obtained from mice expressing just a single self-peptide (19). Thus, there is strong experimental evidence that negative selection weeds out the most crossreactive T cells on the basis of the ‘stickiness’ of the amino acids in their CDR3 regions.

We here address the question how this ‘mechanistic’ selection in the thymus on the basis of amino acid properties affects the average binding probability, i.e., the specificity level, of T lymphocyte receptors. We investigated whether the decrease in the binding probability that is due to negative selection is sufficient to explain the typical precursor frequency of 1:10⁵, or whether evolutionary selection has contributed as well to the exquisite specificity of lymphocytes.

2 Results

2.1 Optimal specificity

We previously developed a simple mathematical model for the probability, P_i, that an immune response to a foreign antigen is mounted from a functional repertoire of R antigen receptors (15, 16). In these models, p is the probability that an antigen receptor binds a pMHC with an avidity exceeding the threshold for a cell to become activated and mount an immune response. We call p the ‘binding probability’ and we will use ‘epitope’ to refer to a particular pMHC. Specific T-cell receptors have a low value of p and crossreactive TCRs have a high value of p. Because this probability, p, directly defines the ‘precursor frequency’ of clonotypes responding to a foreign epitope, we know that 10⁻⁶ ≤ p ≤ 10⁻⁵ would be a reasonable range (2–7). In the models, R₀ is the diversity of the pre-selection repertoire, i.e., the total number of unique antigen receptors made by V(D)J recombination, and S is the number of self-epitopes that require tolerance by clonal deletion, anergy or the formation of Tregs. The diversity of the post-selection (or functional) repertoire, R, is then determined by the probability, P_s (for P_survival), that a clonotype fails to recognize all self-epitopes S,

\begin{array}{l} \begin{matrix} R = R_{0} P_{s} & where, P_{s} = {(1 - p)}^{S} \end{matrix} & (1) \end{array}

According to the simplest model based upon complete self tolerance (15), the probability that a functional repertoire of R TCRs fails to respond to a foreign epitope is the probability that none of its clonotypes recognize the epitope, P_e = (1 − p)^R, where the e stands for ‘escape’. Expressing one minus this chance of escape, as the probability of mounting an immune response to a foreign epitope, we obtain

\begin{array}{l} P_{i} = 1 - P_{e} = 1 - {(1 - p)}^{R} = 1 - {(1 - p)}^{R_{0} P_{s}} & (2) \end{array}

Since ${(1 - x)}^{n} ≃ e^{- x n}$ when x is small, we can approximate P_s and P_i by

\begin{array}{l} \begin{matrix} P_{s} ≃ e^{- p S} & and & P_{i} ≃ 1 - e^{- p R_{0} P_{s}} \end{matrix} & (3) \end{array}

The value of p that maximizes P_i is computed by taking the derivative ∂_pP_i and solving ∂_pP_i= 0. One finds that the maximum is at p = 1/S (15). Evolution is therefore expected to select for individuals with lymphocyte binding probabilities around p = 1/S. Because $S ≃ 10^{5}$ (10), this prediction was strikingly confirmed by the 1:10⁵ estimates for the T-cell precursor frequency (15).

Taking the previously estimated S = 10⁵ self-epitopes (10) as an example, the probability of mounting an immune response, P_i, is depicted in Figure 2A for pre-selection repertoires of R₀ = 10⁵ to 10⁹ clonotypes (as the diversity of the pre-selection repertoire is expected to differ markedly between small and large animals). Large vertebrates like Homo sapiens have post-selection repertoires exceeding R = 10⁹ different T-cell clonotypes (1), and given that only 5% of the T cells maturing in the thymus survive positive and negative selection (25), should have pre-selection repertoires well exceeding R₀ = 10¹⁰ different T-cell clonotypes. One of the smallest vertebrates is the fish species Paedocypris which is known to have about R = 37000 T cells (and about 12000 self-proteins) (26). Such a small species is not expected to be able to generate more than say R₀ = 10⁶ different T-cell clonotypes. The P_i curves in Figure 2A indeed have their optimum at p = 1/S = 10⁻⁵ (as indicated by the vertical dotted line). The dashed sigmoid in Figure 2A depicts the probability, P_s, with which a clone with binding probability p survives negative selection, which illustrates that when p = 1/S, this probability becomes $P_{s} = e^{- 1} ≃ 0.37$ (as indicated by the horizontal dotted line). Reassuringly, the predicted fraction of clonotypes surviving negative selection is higher than the estimated 5% T cells surviving both positive and negative selection (25).

Figure 2

Figure 2 The impact of negative selection on the functional repertoire, in the previous (A) and in the extended model (B–D). (A) The probability of mounting an immune response P_i from Equation (2), and the probability of surviving tolerance induction P_s from Equation (1), as a function of the log binding probability p of the lymphocytes (for 5 values of R₀ and for S = 10⁵). The vertical dotted line denotes p = log₁₀[1/S] = −5. The horizontal dotted line denotes P_s= e⁻¹. (B) The Gaussian functions depict the probability density function of the log binding probability of antigen receptors in the pre-selection repertoire, D₀(x), for 4 values of µ. The declining sigmoid functions depict the probability of survival, P_s, for 4 values of S. We match their color when µ = log₁₀[1/S]. The vertical dotted lines depict the four values of µ. The horizontal dotted line denotes P_s= e⁻¹. (C) The distribution of the binding probability, D(x), of antigen receptors in the post-selection repertoire of Equation (5) for S = 10⁵. (D) The area under the curve of D(x), i.e., $\int_{- \infty}^{0} D (x) d x$ , as a function of µ for four values of S. The horizontal dashed line depicts that 5% of the double-positive thymocytes survives positive and negative selection. Note that the same survival is obtained when a 10-fold increase in S is perfectly compensated by a 10-fold decrease in µ. Parameters: σ = 0.5.

In this simple model all cells were considered to have the same probability, p, of recognizing a random epitope. The models and the data reviewed in the Introduction suggest that the post-selection repertoire also becomes specific because thymic selection weeds out the most crossreactive clonotypes from the pre-selection repertoire (see Figure 1D). We therefore extend the model by allowing for a range of binding probabilities defined by a log normal distribution, with a mean µ and a standard deviation σ,

\begin{array}{l} D_{0} (x) = \frac{1}{σ \sqrt{2 π}} e^{- {(x - μ)}^{2} / (2 σ^{2})}, & (4) \end{array}

where D₀(x) is a probability density, and x = log₁₀ p defines a log specificity, meaning that p = 10^x. Note that the log of the probability, p, obeys a normal distribution, that p = 1 when x = 0, that D₀(x) is only defined for −∞< x ≤ 0, and that we are using a log₁₀ for the specificity, instead of the conventional natural logarithm that is typical for a log normal distribution, because specificity levels are usually expressed as order of magnitudes (e.g., 10⁻⁶< p< 10⁻⁵). Since D₀(x) is a probability density function it has an area under the curve of one. To define the total number of clonotypes, we therefore still need to multiply D₀(x) with R₀ (i.e., R₀(x) = R₀D₀(x)). The probability density function of Equation (4) is depicted in Figure 2B for various values of µ and for σ = 1/2. Note that for σ = 1/2 each repertoire contains a wide variation of antigen receptors, differing by several orders of magnitude in their specificity.

In the same panel of Figure 2B we also depict the survival probability, $P_{s} (x) = e^{- p S} = e^{- 10^{x} S}$ from Equation (3) (see the dashed sigmoid lines representing S = 10³,10⁴, 10⁵ and 10⁶ self-epitopes). A 10-fold increase of S shifts the P_s curve an order of magnitude to the left (as more clonotypes will be lost by tolerance induction).¹ Since the solid D₀(x) and the dashed P_s(x) curves are independent, as the pre-selection repertoire D₀ does not depend on S, it would still be possible to evolve an average specificity, µ, such that the D₀(x) curve intersects the P_s(x) curve at the same height in species with different numbers of self-epitopes, S. We therefore color both curves red, blue, green or orange, when log₁₀[1/S] or µ equals −6,−5,−4 or −3, respectively (throughout the paper). This visualization reveals in Figure 2B that a similar level of survival during negative selection is expected when the average binding probability were decreased 10-fold for any 10-fold increase in S, which is similar to our previous results (14–16).

We study this further by explicitly defining the remaining density of receptors in the post-selection repertoire as D(x) = P_s(x)D₀(x),

\begin{array}{l} D (x) = \frac{1}{σ \sqrt{2 π}} e^{- 10^{x} S - {(x - μ)}^{2} / (2 σ^{2})} and R (x) = R_{0} D (x) . & (5) \end{array}

As an example, the probability density D(x) of the post-selection repertoire is depicted in Figure 2C for the previously estimated S = 10⁵ self-epitopes (10), and for various values of the mean, µ, of the lognormal distribution. Pre-selection repertoires composed of specific receptors, e.g., µ = −6, are hardly affected by tolerance induction to S = 10⁵ self-epitopes, whereas in repertoires composed of crossreactive receptors only a small fraction of the clonotypes survive tolerance induction to these S = 10⁵ self-epitopes (compare the height of the red µ = −6 curve with that of the green µ = −4 curve in Figure 2C, and compare the pre- and post-selection curves between Figures 2B, C). The fraction of clonotypes surviving tolerance induction can be quantified by plotting the area under the curve,^²

\begin{array}{l} AUC = \frac{\int_{- \infty}^{0} R (x) d x}{R_{0}} = \int_{- \infty}^{0} D (x) d x, & (6) \end{array}

as a function of the pre-selection average log specificity, µ (depicted for various values of S in Figure 2D). These curves reveal that whenever $p ≫ 1 / S (or μ > lo g_{10} [1 / S])$ , only a small fraction of the clonotypes survive tolerance induction. Since the combined survival of positive and negative selection was estimated to be 5% (25), the dashed horizontal line at P_s = 0.05 depicts an experimental lower bound: negative selection by itself should not be lower than P_s = 0.05. Together the curves in Figure 2D suggest that the average binding probability of the preselection repertoire cannot be larger than µ = −4 for S = 10⁵, µ = −3 for S = 10⁴ and µ = −2 for S = 10³; otherwise less than 5% of the clonotypes in the pre-selection repertoire survive negative selection. Because specific receptors preferentially survive, the post-selection distributions of the more crossreactive repertoires, e.g., µ ≥−5 (for S = 10⁵), are skewed to the left in Figure 2C (compare the location of the peaks with the color-matching vertical dotted lines at µ = −6,−5 and µ = −4). Thus, this model confirms that negative selection makes a crossreactive pre-selection repertoire more specific (18, 21).

Figure 2D confirms that one obtains the same fraction of clonotypes surviving (the same AUC), whenever a 10-fold increase in S is compensated by a 10-fold decrease in the average pre-selection binding probability, µ. This was already suggested by the color-matching curves in Figure 2B. Since the same fraction of clonotypes survive when an increase in S is perfectly compensated for by a decrease in the average binding probability, µ, the results remain similar to the optimum, p = 1/S, obtained with our earlier model (15), which did not consider a distribution of receptor binding probabilities. According to both models, species with more self-epitopes should thus have a more specific pre-selection repertoire to achieve a similar completeness of the functional post-selection repertoire.

2.2 Mounting immune responses

Because negative selection skews the binding probabilities, we explicitly compute the average binding probability of the post-selection repertoire of conventional T cells (by using the general definition of an average),

\begin{array}{l} μ_{Tconv} = \frac{\int_{- \infty}^{0} x D (x) d x}{\int_{- \infty}^{0} D (x) d x} = \frac{\int_{- \infty}^{0} x D (x) d x}{A U C} . & (7) \end{array}

Figure 3A reveals that the average post-selection binding probability, µ_Tconv, is always lower than the preselection binding probability, µ (observe that all curves are located below the diagonal). Moreover, the skewing of µ_Tconv increases when there are more self-epitopes, and when the pre-selection repertoire is more crossreactive (observe that the distance to the diagonal increases with µ and S). Despite this skewing, µ_Tconv increases monotonically with the average binding probability, µ, of the pre-selection repertoire.

Figure 3

Figure 3 Properties of the functional post-selection repertoire in the extended model. (A) The average log binding probability, µ_Tconv, of the post-selection repertoire defined by Equation (7). The dotted line depicts the diagonal (i.e., the situation where negative selection has no effect on the specificity of conventional T cells). This reveals that Tconv cells become more specific when there are more self-epitopes, and that this effect (i.e., the distance to the diagonal) increases when the pre-selection repertoire is more crossreactive. (B) The contribution to the immune response, C(x) = R₀pD(x), as defined by Equation (8), to a foreign antigen as a function of the log specificity, x (for R₀ = 10⁸ and S = 10⁵). (C) The breadth of the immune response to a foreign antigen, $B = \int_{- \infty}^{0} C (x) d x$ , as a function of the average log specificity of the pre-selection repertoire, for four values of S. Parameters: σ = 0.5.

Although the loss through negative selection increases with the crossreactivity of the pre-selection repertoire (Figure 2D), the Tconv clonotypes surviving selection in a crossreactive pre-selection repertoire, do have a relatively high probability to respond to a foreign antigen (Figures 2C, 3A). To quantify the immune response we computed the expected breadth of an immune response. For functional clonotypes with a binding probability p = 10^x, we define the contribution, C(x), to the immune response to a foreign epitope as C(x) = pR(x) = pR₀D(x),

\begin{array}{l} C (x) = \frac{R_{0}}{σ \sqrt{2 π}} 10^{x} e^{- 10^{x} S - {(x - μ)}^{2} / (2 σ^{2})}, & (8) \end{array}

which is depicted in Figure 3B for each specificity, x, and for various values of µ. This reveals that for S = 10⁵, the largest contribution is expected from cells with a binding probability of $p ≃ 10^{- 5} = 1 / S$ . The total number of clonotypes in an immune response, i.e., the breadth of the response, is then defined as the integral $B = \int_{- \infty}^{0} C (x) d x$ , which is depicted in Figure 3C for various values of S. This confirms that a pre-selection repertoire centered around µ = −5, is expected to mount the most diverse immune response to a foreign antigen (compare the location of the peaks with the color-matching vertical dotted lines). Note that this breadth, B, replaces the probability of an immune response, P_i, of the previous model. Because due to its continuous nature there is always an immune response in the extended model (although it can become extremely narrow).

These results suggest that the binding probabilities of the functional post-selection repertoire are indeed determined by negative selection because the most crossreactive clonotypes in a pre-selection repertoire have the highest chance of becoming deleted. In Figures 2C, 3A we saw that in crossreactive preselection repertoires, negative selection skews the distribution of binding probabilities to more specific clonotypes. Hence the previous observation (14–16) that the evolutionary optimum, p = 1/S, coincides with the typical precursor frequency (2–7), can also be explained by negative selection. Such a specific binding probability of the post-selection repertoire naturally follows from strong negative selection within a crossreactive pre-selection repertoire with many self-epitopes, e.g., for µ = −3 and S = 10⁵, see Figure 3A. Nevertheless, our extended model also confirms the previous results, as the optimal binding probability of the pre-selection repertoire remains to be centered around p = 1/S (see the color-matching vertical dotted lines in Figure 3C), because pre-selection repertoires composed of too specific receptors have a low probability to respond to foreign epitopes (Figure 3B), whereas repertoires composed of too crossreactive receptors suffer too much from clonal deletion (Figure 2D). Thus, evolution is still expected to select for immune systems with pre-selection lymphocyte binding probabilities centered around $p = 1 / S ≃ 10^{- 5}$ .

2.3 Optimizing the pre-selection repertoire

In addition to maximizing the probability of mounting an immune response by optimizing the recognition probability, we previously (15) also computed the size of the pre-selection repertoire required for having a sufficiently complete (8) functional repertoire for any given value of p. The probability that a foreign epitope is not recognized by any of the clonotypes in the functional repertoire was defined as P_e= (1−p)^R [see Equation (2) and (15)]. Solving R₀ for a particular probability of escape, P_e, corresponds to

\begin{array}{l} P_{e} = (1 - p)^{R} ≃ e^{- p R} ≃ e^{- p R_{0} e^{- p S}} or R_{0} ≃ - ln [P_{e}] \frac{e^{p S}}{p} . & (9) \end{array}

Since most pathogens express several epitopes, picking P_e ≤ 0.1 would allow most pathogens to be recognized.³ Plotting R₀ as a function of p reveals that this function has a minimum (see Figure 4A), and solving ∂_pR₀ = 0 shows that this minimum is again located at p = 1/S. At this minimum R₀ = −ln[P_e]eS, which is proportional to the number of self epitopes, S, and only depends logarithmically on the probability of escape, P_e. This confirms that the immune system needs to be specific largely because there are so many self epitopes.⁴ For p values larger than 1/S, the required R₀ rapidly becomes prohibitively large (see Figure 4A).

Figure 4

Figure 4 Optimizing the diversity of the pre-selection T-cell repertoire. (A) The required preselection repertoire diversity, R₀, in the previous model, for P_e= 0.1 per foreign epitope [see Equation (9)] and for three values of S. (B) The compensation required for keeping $\int_{- \infty}^{0} D (x) d x = 1$ in the extended model for σ = 0.5 and for three values of S. Note that (B) is the inverse of Figure 2D.

In the extended model a foreign epitope never completely escapes recognition, but the breadth of its immune response, B, can become extremely narrow. We can perform a similar analysis by increasing R₀ to compensate for the loss of clonotypes due to negative selection. Thus, rescaling the area under the curve of the post-selection repertoire to one for every value of µ, we define D_N(x) = D(x)/AUC, where the subscript N stands for ‘normalized’. The required compensation in the size of the pre-selection repertoire, 1/AUC, is depicted in Figure 4B. We again observe that this compensation becomes prohibitively large for repertoires that are considerably more crossreactive than p = 1/S (or µ = log₁₀ 1/S). We conclude that both models agree on the fact that an unrealistically large pre-selection repertoire is required whenever the pre-selection repertoire is too crossreactive (Figures 4A, B). Evolution should therefore select for pre-selection binding probabilities in a medium range that does not exceed p = 1/S too much.

2.4 Regulatory T cells

In the extended model, negative selection selects for more specific receptors in the functional repertoire (Figures 2C, 3A). The receptors that become negatively selected should therefore be less specific, i.e., more crossreactive. This prediction by Chao et al. (20) and Kosmrlj et al. (21) was recently confirmed by Lagattuta et al. (24), who demonstrated that the more ‘sticky’ hydrophobic amino acids, such as phenylalanine, leucine, tryptophan and tyrosine, are enriched in Treg cells, while the more weakly interacting amino acids, such as aspartic acid and glutamic acid, are enriched in Tconv cells. The repertoire of receptors that are negatively selected in our model is defined as

\begin{array}{l} D_{Treg} (x) = (1 - P_{s} (x)) D_{0} (x) . & (10) \end{array}

Loosely calling this the ‘Treg’ repertoire, D_Treg(x) is depicted in Figure 5A (for various values of µ and for S = 10⁵). The area under the curve, and the average log specificity, are defined as

Figure 5

Figure 5 Regulatory T cells. (A) The probability density function of the post-selection Treg repertoire (as defined by Equation (10) for S = 10⁵). (B) The average log binding probability, µ_Treg, of the post-selection Treg repertoires for four values of S [see Equation (11)]. The dotted line depicts the diagonal (i.e., the situation where negative selection has no effect on the specificity). Treg cells tend to become more crossreactive by negative selection (i.e., all curves are located above the diagonal). Parameters: σ = 0.5.

\begin{array}{l} {AUC}_{Treg} = \int_{- \infty}^{0} D_{Treg} (x) d x and μ_{Treg} = \frac{\int_{- \infty}^{0} x D_{Treg} (x) d x}{A U C_{Treg}} . & (11) \end{array}

Negative selection increases the crossreactivity of Tregs, especially when there are few self-epitopes and when the pre-selection repertoire is specific (see Figure 5B), because the most crossreactive receptors have the highest probability [1 − P_s(x)] to become a Treg. The average binding probability of the Treg repertoire is hardly affected when there are many self-epitopes and the pre-selection repertoire is crossreactive because most receptors then become tolerized.

3 Discussion

Extending a previous model (15, 16) by allowing for a range of binding probabilities of T lymphocytes, we have confirmed the very natural notion (20, 21) that negative selection in the thymus is biased towards crossreactive T cells (18, 19, 24). Nevertheless, we have seen that binding probabilities of the TCRs in the pre-selection repertoire of a particular species need to be adapted to the number of self-antigens in that species to prevent massive deletion by negative selection. Additionally, lymphocyte receptors should not be too specific, as the functional post-selection repertoire needs to be fairly complete (8), i.e., cover most of shape space (9), to provide good protection from foreign antigens. Thus both evolution and negative selection play an important role in the exquisite binding probabilities of T cells.

We have modeled the fact that TCRs differ in their pMHC binding probabilities, e.g., due to the hydrophobicity of the amino acids in their CDR3 region (19, 23, 24). Similar effects may also play a role for MHC molecules presenting short peptides to T cells, as the polymorphic part of the MHC that forms part of the pMHC-TCR interface can also be composed of weakly and strongly binding amino acids. A significant part of the variation between the TCRs of Tregs and Tconvs can be attributed to binding the MHC molecule rather than the peptide (24). Based upon their modeling, Chao et al. (20) predicted that T cells binding their selecting MHCs strongly are more likely to become negatively selected. Thus, MHC alleles having strongly binding amino acids in the MHC-TCR interface would select a smaller T-cell repertoire. In our model, this would correspond to increasing the average level of crossreactivety, µ, of the pre-selection repertoire. Additionally, depending on the amino acids in the peptide binding groove, some MHC molecules could bind more peptides than others. Kosmrlj et al. (27) argued that particular MHC alleles do bind fewer self peptides than others (which remains somewhat uncertain because little is known about the absolute binding threshold of different MHC alleles), and that these MHCs hence select for a larger and more crossreactive functional repertoire. This would at least partly compensate for the lower number of foreign epitopes expected to be presented by such selective MHC molecules. Since in our model a more restrictive binding groove would correspond to decreasing the number of self-pMHC, S, the effects of these potential differences in the fraction of peptides bound by different MHC molecules can be predicted by changing S in the model (see Figure 3).⁵ Summarizing, since MHC molecules are polymorphic and differ in their binding properties, they may each select for a unique level of diversity and average specificity of the pool of T-cells restricted to them.

A considerable fraction of the TCRs in human T-cell repertoires lack a D-segment, and such sequences are preferentially generated during fetal development (28). Because D-segments tend to code for ‘non-sticky’ amino acids, with a strong enrichment of glycine (28), this suggests that the very early pre-selection T-cell repertoire is enriched in crossreactive receptors. Additionally, abundant TCRβ sequences in naive T cell samples of young individuals tend to have high generation probabilities, short CDR3s, and absence of N additions (29–32). These differences suggest that TCRs may indeed differ widely in their binding probabilities. It is tempting to speculate that this early enrichment in crossreactive TCRs enables a rapid early coverage of the space of potential foreign antigens. However, our modeling also reveals that these receptors should not be too crossreactive, since they also need to survive negative selection. Nevertheless, it would seem beneficial to first fill the space with those crossreactive receptors that happen to survive negative selection, and later fill in the holes by making the pre-selection repertoire more specific.

Although there is promiscuous expression of self-antigens in the thymus (33), it remains unlikely that self tolerance is complete. Healthy individuals do harbor T cells that can recognize self-epitopes (3, 34, 35). Previously we have included potentially auto-reactive clonotypes in the model of Equation (2) by allowing a fraction of the self-pMHC to not impose negative selection (14, 16). A successful immune response was then defined as the probability of having an immune response from clonotypes not binding any of these ‘ignored’ self-epitopes. Since in species with large pre-selection repertoires, R₀, the probability of mounting an immune response to a foreign epitope, P_i, is close to one for a wide range of binding probabilities (see Figure 2A), it then becomes beneficial to have pre-selection binding probabilities lower than p = 1/S to reduce the probability of also recruiting potentially auto-reactive clonotypes into the immune response (14, 16). Hence the optimum p = 1/S of Equation (2) should be regarded as an upper bound, and S should be regarded as the number of self-epitopes imposing tolerance in the T-cell repertoire. Similarly, if not all self pMHC impose negative selection because some are ignored, have too low expression levels, or invoke indirect tolerance mechanisms, our estimate of S = 10⁵ (10) would be an upper bound. In most of our analyses we have therefore also considered 10 to 100-fold lower values of S. However, if some of the MHC molecules present more than 1% of the peptides, and/or if alternative splicing would allow for more than the predicted 10⁷ 9-mers in the human genome (10), one could also argue that S could be larger than 10⁵. Fortunately, we obtain qualitatively similar results for all values of S, i.e., all models agree that the pre-selection binding probability should not exceed $p ≃ 1 / S$ .

In summary, our analyses confirm that the exquisite binding probability of functional T-lymphocytes in the circulation is naturally explained by negative selection on a large diversity of self-antigens. Nevertheless, evolution must have molded the binding probability of the pre-selection repertoire into a range that is compatible with the large diversity of self-antigens that are present in vertebrates, otherwise the post-selection repertoire would be too specific or too empty to respond to foreign intruders.

Data availability statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Author contributions

RB: Conceptualization, Writing – original draft, Writing – review & editing. CK: Conceptualization, Writing – review & editing. AP: Conceptualization, Writing – review & editing. JB: Conceptualization, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. Portions of this work were performed under the auspices of the U. S. Department of Energy under contract 89233218CNA000001 and supported by NIH grant R01-AI028433 (ASP).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

^ Because P_s = e⁻¹ when p = 1/S, the different P_s(x) curves all approach P_s ≃ 0.37 when x = log₁₀1/S. See their intersections with the color-matching vertical dotted lines.
^ Since the maximum binding probability, p = 10^x = 1, occurs when x = 0, the upper limit of the integral in Equation (6) is set to zero.
^ If all 37000 T cells in the very small fish Paedocypris (26) would be unique clonotypes, the probability that a foreign epitope is not recognized would even be larger, P_e ≃e^{−10−537000}≃0.7.
^ Note that P_e≃1/e when R = 1/p, i.e., when one expects one response per epitope. The pre-selection repertoire R₀ should then be 2.73-fold larger than the number of self-epitopes.
^ The probability to respond to a foreign epitope, P_i, would not change, but a pathogen would be represented by fewer epitopes on MHCs with a more restrictive binding groove.

References

1. Qi Q, Liu Y, Cheng Y, Glanville J, Zhang D, Lee JY, et al. Diversity and clonal selection in the human T-cell repertoire. Proc Natl Acad Sci USA. (2014) 111:13139–44. doi: 10.1073/pnas.1409155111

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Blattman JN, Antia R, Sourdive DJ, Wang X, Kaech SM, Murali-Krishna K, et al. Estimating the precursor frequency of naive antigen-specific CD8 T cells. J Exp Med. (2002) 195:657–64. doi: 10.1084/jem.20001021

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Su LF, Kidd BA, Han A, Kotzin JJ, Davis MM. Virus-specific CD4⁺ memory-phenotype T cells are abundant in unexposed adults. Immunity. (2013) 38:373–83. doi: 10.1016/j.immuni.2012.10.021

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Kotturi MF, Scott I, Wolfe T, Peters B, Sidney J, Cheroutre H, et al. Naive precursor frequencies and MHC binding rather than the degree of epitope diversity shape CD8⁺ T cell immunodominance. J Immunol. (2008) 181:2124–33. doi: 10.4049/jimmunol.181.3.2124

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Obar JJ, Khanna KM, Lefrancois L. Endogenous naive CD8⁺ T cell precursor frequency regulates primary and memory responses to infection. Immunity. (2008) 28:859–69. doi: 10.1016/j.immuni.2008

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Haluszczak C, Akue AD, Hamilton SE, Johnson LD, Pujanauski L, Teodorovic L, et al. The antigenspecific CD8⁺ T cell repertoire in unimmunized mice includes memory phenotype cells bearing markers of homeostatic expansion. J Exp Med. (2009) 206:435–48. doi: 10.1084/jem.20081829

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Jenkins MK, Chu HH, McLachlan JB, Moon JJ. On the composition of the preimmune repertoire of T cells specific for peptide-major histocompatibility complex ligands. Annu Rev Immunol. (2010) 28:275–94. doi: 10.1146/annurev-immunol-030409-101253

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Coutinho A. The self-nonself discrimination and the nature and acquisition of the antibody repertoire. Ann Immunol (Paris). (1980) 131:235–53.

Google Scholar

9. Perelson AS, Oster GF. Theoretical studies of clonal selection: minimal antibody repertoire size and reliability of self-non-self discrimination. J Theor Biol. (1979) 81:645–70. doi: 10.1016/0022-5193(79)90275-3

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Burroughs NJ, De Boer RJ, Kesmir C. Discriminating self from nonself with short peptides from large proteomes. Immunogenetics. (2004) 56:311–20. doi: 10.1007/s00251-004-0691-0

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Mason D. A very high level of crossreactivity is an essential feature of the T-cell receptor. Immunol Today. (1998) 19:395–404. doi: 10.1016/s0167-5699(98)01299-7

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Wooldridge L, Ekeruche-Makinde J, Van den Berg HA, Skowera A, Miles JJ, Tan MP, et al. A single autoimmune T cell receptor recognizes more than a million different peptides. J Biol Chem. (2012) 287:1168–77. doi: 10.1074/jbc.M111.289488

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Rappazzo CG, Fernández-Quintero ML, Mayer A, Wu NC, Greiff V, Guthmiller JJ. Defining and studying B cell receptor and TCR interactions. J Immunol. (2023) 211:311–22. doi: 10.4049/jimmunol.2300136

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Borghans JAM, De Boer RJ. Crossreactivity of the T-cell receptor. Immunol Today. (1998) 19:428–9. doi: 10.1016/S0167-5699(98)01317-6

PubMed Abstract | CrossRef Full Text | Google Scholar

15. De Boer RJ, Perelson AS. How diverse should the immune system be? Proc R Soc Lond. B Biol Sci. (1993) 252:171–5. doi: 10.1098/rspb.1993.0062

CrossRef Full Text | Google Scholar

16. Borghans JAM, Noest AJ, De Boer RJ. How specific should immunological memory be? J Immunol. (1999) 163:569–75. doi: 10.4049/jimmunol.163.2.569

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Huseby ES, Crawford F, White J, Kappler J, Marrack P. Negative selection imparts peptide specificity to the mature T cell repertoire. Proc Natl Acad Sci USA. (2003) 100:11565–70. doi: 10.1073/pnas.1934636100

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Huseby ES, White J, Crawford F, Vass T, Becker D, Pinilla C, et al. How the T cell repertoire becomes peptide and MHC specific. Cell. (2005) 122:247–60. doi: 10.1016/j.cell.2005.05.013

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Dai S, Huseby ES, Rubtsova K, Scott-Browne J, Crawford F, Macdonald WA, et al. Crossreactive T Cells spotlight the germline rules for alphabeta T cell-receptor interactions with MHC molecules. Immunity. (2008) 28:324–34. doi: 10.1016/j.immuni.2008.01.008

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Chao DL, Davenport MP, Forrest S, Perelson AS. The effects of thymic selection on the range of T cell cross-reactivity. Eur J Immunol. (2005) 35:3452–9. doi: 10.1002/eji.200535098

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Kosmrlj A, Jha AK, Huseby ES, Kardar M, Chakraborty AK. How the thymus designs antigen-specific and self-tolerant T cell receptor sequences. Proc Natl Acad Sci USA. (2008) 105:16671–6. doi: 10.1073/pnas.0808081105

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Zeldovich KB, Berezovsky IN, Shakhnovich EI. Protein and DNA sequence determinants of thermophilic adaptation. PloS Comput Biol. (2007) 3:e5. doi: 10.1371/journal.pcbi.0030005

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Stadinski BD, Shekhar K, Gómez-Touriño I, Jung J, Sasaki K, Sewell AK, et al. Hydrophobic CDR3 residues promote the development of self-reactive T cells. Nat Immunol. (2016) 17:946–55. doi: 10.1038/ni.3491

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Lagattuta KA, Kang JB, Nathan A, Pauken KE, Jonsson AH, Rao DA, et al. Repertoire analyses reveal T cell antigen receptor sequence features that influence T cell fate. Nat Immunol. (2022) 23:446–57. doi: 10.1038/s41590-022-01129-x

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Stritesky GL, Xing Y, Erickson JR, Kalekar LA, Wang X, Mueller DL, et al. Murine thymic selection quantified using a unique method to capture deleted T cells. Proc Natl Acad Sci USA. (2013) 110:4679–84. doi: 10.1073/pnas.1217532110

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Giorgetti OB, Shingate P, O’Meara CP, Ravi V, Pillai NE, Tay BH, et al. Antigen receptor repertoires of one of the smallest known vertebrates. Sci Adv. (2021) 7:e0257016. doi: 10.1126/sciadv.abd8180

CrossRef Full Text | Google Scholar

27. Kosmrlj A, Read EL, Qi Y, Allen TM, Altfeld M, Deeks SG, et al. Effects of thymic selection of the T-cell repertoire on HLA class I-associated control of HIV infection. Nature. (2010) 465:350–4. doi: 10.1038/nature08997

PubMed Abstract | CrossRef Full Text | Google Scholar

28. De Greef PC, De Boer RJ. TCRβ rearrangements without a D segment are common, abundant, and public. Proc Natl Acad Sci USA. (2021) 118:e1009425. doi: 10.1073/pnas.2104367118

CrossRef Full Text | Google Scholar

29. Robins HS, Srivastava SK, Campregher PV, Turtle CJ, Andriesen J, Riddell SR, et al. Overlap and effective size of the human CD8⁺ T cell receptor repertoire. Sci Transl Med. (2010) 2:47–64. doi: 10.1126/scitranslmed.3001442

CrossRef Full Text | Google Scholar

30. Venturi V, Quigley MF, Greenaway HY, Ng PC, Ende ZS, McIntosh T, et al. A mechanism for TCR sharing between T cell subsets and individuals revealed by pyrosequencing. J Immunol. (2011) 186:4285–94. doi: 10.4049/jimmunol.1003898

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Pogorelyy MV, Elhanati Y, Marcou Q, Sycheva AL, Komech EA, Nazarov VI, et al. Persisting fetal clonotypes influence the structure and overlap of adult human T cell receptor repertoires. PloS Comput Biol. (2017) 13:e1005572. doi: 10.1371/journal.pcbi.1005572

PubMed Abstract | CrossRef Full Text | Google Scholar

32. De Greef PC, Oakes T, Gerritsen B, Ismail M, Heather JM, Hermsen R, et al. The naive T-cell receptor repertoire has an extremely broad distribution of clone sizes. Elife. (2020) 9. doi: 10.7554/eLife.49900

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Kyewski B, Derbinski J. Self-representation in the thymus: an extended view. Nat Rev Immunol. (2004) 4:688–98. doi: 10.1038/nri1436

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Danke NA, Koelle DM, Yee C, Beheray S, Kwok WW. Autoreactive T cells in healthy individuals. J Immunol. (2004) 172:5967–72. doi: 10.4049/jimmunol.172.10.5967

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Malhotra D, Linehan JL, Dileepan T, Lee YJ, Purtha WE, Lu JV, et al. Tolerance is established in polyclonal CD4⁺ T cells by distinct mechanisms, according to self-peptide expression patterns. Nat Immunol. (2016) 17:187–95. doi: 10.1038/ni.3327

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: T cell specificity, T cell repertoire, negative selection, evolution, repertoire diversity

Citation: De Boer RJ, Kesmir C, Perelson AS and Borghans JAM (2024) Is the exquisite specificity of lymphocytes generated by thymic selection or due to evolution? Front. Immunol. 15:1266349. doi: 10.3389/fimmu.2024.1266349

Received: 24 July 2023; Accepted: 11 March 2024;
Published: 25 March 2024.

Edited by:

Tobias L. Lenz, University of Hamburg, Germany

Reviewed by:

Arundhoti Das, National Institutes of Health (NIH), United States
Victor Greiff, University of Oslo, Norway

Copyright © 2024 De Boer, Kesmir, Perelson and Borghans. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Rob J. De Boer, ci5qLmRlYm9lckB1dS5ubA==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.