- Leiden Institute of Physics, Leiden University, Leiden, Netherlands
Since the discovery of cells by Robert Hooke and Antoni van Leeuwenhoek in the 17th century, thousands of different cell types have been identified, most recently by sequencing-based single-cell profiling techniques. Yet, for many organisms we still do not know, how many different cell types they are precisely composed of. A recent survey of experimental data, using mostly morphology as a proxy for cell type, revealed allometric scaling of cell type diversity with organism size. Here, I argue from an evolutionary fitness perspective and suggest that three simple assumptions can explain the observed scaling: Evolving a new cell type has, 1. a fitness cost that increases with organism size, 2. a fitness benefit that also increases with organism size but 3. diminishes exponentially with the number of existing cell types. I will show that these assumptions result in a quantitative model that fits the observed cell type numbers across organisms of all size and explains why we should not expect isometric scaling.
Introduction
Since the advent of high throughput single-cell profiling techniques, a large number of cell types has been catalogued across many different tissues. For example, the Tabula Sapiens consortium recently identified over 400 cell types across 24 different human tissues (Tabula Sapiens Consortium et al., 2022). Whether each cluster of transcriptomes or other molecular profiles should be considered a separate cell type is still under debate (Clevers, 2017; Mircea and Semrau, 2021) and we certainly need improved methods to discriminate biologically meaningful variability from random noise (Mircea et al., 2022). Nevertheless, single-cell profiling has revealed a high diversity of cell states and one might be forgiven to wonder: Could each cell be its own, highly specialized cell type? Here, I will argue, from an evolutionary fitness perspective, that we should expect much fewer cell types than cells in an organism. Whole-organism single-cell transcriptomics data sets are currently still rare (Lähnemann et al., 2020) and, as mentioned above, uncertainties in the interpretation of these data sets remain. To circumvent these problems, I base my arguments on recent studies by Fisher et al. (Fisher et al., 2013; Fisher et al., 2020), who collected published cell type numbers, mostly derived from morphological characteristics. These studies found that the number of cell types scales allometrically with the total number of cells in the organism (Figure 1). Intriguingly, the data could not be fit by a single power law, in contrast to many other allometric relationships (West and Brown, 2005). As shown in seminal work by Geoffrey West and co-workers, power law scaling can arise from the optimization of metabolic rate subject to geometric constraints of relevant tissues, such as the vasculature (West et al., 1997; Enquist et al., 1999; West et al., 1999; West et al., 2002). Fisher et al. therefore fit two separate power laws, for small and large organisms, respectively, suggesting that larger organisms face additional constraints. In contrast to the allometric scaling of metabolic rate, it is not immediately obvious that geometric or physiological constraints should be the only relevant factors for cell type allometry. One might therefore not expect a priori to find power law scaling.
FIGURE 1. A simple model assuming diminishing fitness benefits of additional cell types performs as well as a double power law in explaining cell type allometry. Fisher et al. (2013, 2020)) collected cell type information for a range of different organisms (open circles). They modeled the observed allometric relationship with separate power laws for small and large organisms (top row). My model explains the observed allometry by a fitness benefit that diminishes with increasing cell type number (bottom row). I used a Bayesian hierarchical modeling approach to compare the two models. The data was assumed to be normally distributed with a standard deviation (sd) that was either constant (left column) or allowed to increase linearly with log10(Number of cells) (right column). The regression curves, shown as solid lines, are posterior means of the models, with the 95% highest density interval (HDI) indicated by a blue band. The orange band indicates the HDI of the posterior predictive distribution (ppd). For each model, the expected log predictive density (elpd) is reported together with its standard error. Judging by the elpd, the ‘diminishing returns’ model performs about as well as the double power law and the models with variable sd perform slightly better than the models with constant sd. Point estimates and HDIs of all parameters can be found in Table 1.
Diminishing returns model
Here, I develop an alternative model that can explain the observed scaling across organisms of all sizes. This model considers the effect of a new cell type on an organism’s fitness. I adopt a notion of fitness described by Wagner as “a measure predicting the competitive ability of a genotype compared to another” (Wagner, 2010), which can in principle be determined by pairwise competition experiments. I reason that mutations giving rise to a new cell type can only be fixed in a population, if they lead to an increase in fitness. I therefore model the appearance of new cell types during evolution as discrete events that have an associated fitness cost
Requiring
To rigorously compare this ‘diminishing returns’ model with the double power law, I used a Bayesian hierarchical approach (see Materials and Methods for the model definitions and priors). I assumed that the cell type numbers are normally distributed in log-space with a mean given by the double power law (i.e., a piecewise linear relationship in log-space) or the relationship derived above. Initially, I assumed the standard deviation to be constant (Figure 1, left column). Posterior distributions of the parameters were obtained by Markov Chain Monte Carlo sampling. Estimates of the slopes and breakpoint in the double power law were very similar to those obtained by Fisher et al. (Fisher et al., 2020) with ordinary least squares fitting (see Table 1, first two columns). To compare the models quantitatively I estimated the expected log posterior density (elpd) using leave-one-out cross-validation. The elpd was slightly larger for the double power law model but the difference was well within the standard error of the elpd (see Figure 1 and Table 1). The ‘diminishing returns’ model hence fits the data as well as the double power law. As the spread of the cell type numbers around the regression curves seems to increase with cell number, I next tested models in which the standard deviation was allowed to increase linearly with log-cell number (Figure 1, right column). Judging by the elpd, allowing the standard deviation to vary improved model performance for both the double power law and the ‘diminishing returns’ model (Table 1). Again, the difference in elpd between the double power law and the ‘diminishing returns’ model was within the standard error. The increased spread for larger organisms is possibly related to differences between multicellular lineages and the environments in which they evolved, as pointed out in Fisher et al (2020).
TABLE 1. Estimates of model parameters and model comparison. The power law parameters reported by Fisher et al, (2013); Fisher et al., 2020) (first column) are ordinary least-squares estimates and the intervals are confidence intervals (CIs). For the Bayesian models described in this paper (last 4 columns) parameters are given as the mean of the posterior together with the 95% highest density interval (HDI). The Bayesian models assume a normal distribution of the data in log space with the regression curve as the mean. Intercept, slope (small N), breakpoint and slope (large N) parameterize the regression curve of the double power law, whereas A and B parameterize the regression curve of the ‘diminishing returns’ model, sd is the standard deviation of the normal distribution in the models that keep the standard deviation constant. Sd intercept and sd slope parametrize a linear increase of the standard deviation with log-cell number in the models that allow the standard deviation to vary. The expected log predictive density (elpd) and its standard error (se) was calculated for the Bayesian models using leave-one-out cross-validation.
Discussion
In the derivation presented here, I made several assumptions that require critical assessment. First, I implied that cell types are discrete and stable entities, while others put forward the notion of dynamic cell states that lie on a continuum (Clevers, 2017). I further assumed that cell types are functionally different, by some measure, and able to confer a fitness advantage when they appear. I treated cell morphology as a reasonable proxy for cell type, which might lead to an underestimation of the number of cell types. Likely, the number of observed morphologies is some fraction of the true number of cell types, such that the true scaling behavior is still qualitatively the same as observed by Fisher et al. In my model, the appearance of a new cell type is a discrete event, which is certainly a strong simplification of the actual processes by which new cell types arise (Arendt, 2008). Finally, I modeled the diminishing benefits provided by additional cell types with an exponential decay. While it is reassuring that the resulting model fits the data set considered here, direct fitness measurements will be necessary to confirm this assumption.
In summary, I developed a phenomenological model of cell type allometry using a minimal number of assumptions. The model is therefore agnostic of evolutionary lineages and related systematic differences. Nevertheless, I showed that diminishing fitness benefits can explain the observed cell type allometry. I hope that this manuscript will stimulate experiments and the development of more sophisticated models.
Materials and methods
The experimental data shown in Figure 1 was published previously (Fisher et al., 2013) and made publicly available on Dryad (https://datadryad.org/stash/dataset/doi:10.5061/dryad.27q59). All models were fit in double log-space. Consequently, log-transformed cell numbers N and cell type numbers K are used in the model definitions:
To compare the double power law model with the ‘diminishing returns’ model, a Bayesian hierarchical approach was used. The log-cell type number k was assumed to be normally distributed. For the double power law, the mean of the normal distribution is given by a piecewise linear relationship between n and k. In the case of constant standard deviation (i.e., the spread of k does not depend on the log-cell number n), the double power law model is thus defined by
where k0 is the intercept of log-cell type numbers k, and ssmall and slarge are the slopes below and above the breakpoint nbp, respectively. Normal indicates a normal distribution with mean μ and standard deviation σ, Uniform is a uniform distribution between a and b, and HalfCauchy is a Cauchy distribution at location 0 with half-width half-maximum γ that was truncated below 0 so that only positive values have non-zero probability.
For variable standard deviation (i.e., the spread of the log-cell type number k increases linearly with log-cell number n) the model is defined by
where Σ0 and sΣ are the intercept and slope, respectively, of the linear model for the standard deviation. HalfNormal is a Normal distribution with mean μ = 0 and standard-deviation σ truncated below 0 such that only positive values have non-zero probabilities.
The ‘diminishing returns’ model, which assumes the fitness benefit to decrease with cell type number, is correspondingly defined by
in the case of constant standard deviation and by
when the standard deviation is allowed to increase linearly with log-cell number n.
The posterior distributions of all parameters were obtained by Markov Chain Monte Carlo sampling using the python package pymc (version 4.1.2) with 2 chains, 2000 tuning steps and 10,000 samples. The “target_accept” parameter was kept at the default value of 0.8 except for the ‘diminishing returns’ model with constant standard deviation. That model required a “target_accept” of 0.99 to avoid divergences. For model comparison, the arviz python package (version 0.12.1) was used to estimate the expected log posterior density (elpd) by leave-one-out cross-validation. The regression curves shown as solid lines in Figure 1 are posterior means of f(n): For each n, the average of f(n) over the posterior distribution of the parameters was calculated. The 95% highest density intervals (HDIs) shown as blue bands in Figure 1 correspond to the smallest intervals that contain 95% of the posterior distribution of f(n) for a specific n. The 95% HDIs of the posterior predictive distribution (ppd) correspond to the smallest intervals containing 95% of the posterior distribution of the log-cell type number k for a given n.
The jupyter notebook used to produce all presented results from the raw data can be obtained from github (https://github.com/semraulab/allometry).
Data availability statement
The dataset used in this study is publicly available from Dryad under a CC0 Universal (CC0 1.0) Public Domain Dedication license: https://datadryad.org/stash/dataset/doi:10.5061/dryad.27q59.
Author contributions
SS conceived of the model, carried out the data analysis and wrote the manuscript.
Acknowledgments
I am very grateful to Itai Yanai, Liedewij Laan, and Günter Wagner for encouragement, discussions and comments on the manuscript. I would like to thank Maria Mircea for proofreading the manuscript.
Conflict of interest
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Arendt, D. (2008). The evolution of cell types in animals: Emerging principles from molecular studies. Nat. Rev. Genet. 9, 868–882. doi:10.1038/nrg2416
Tabula Sapiens Consortium, Karkanias, J., Krasnow, M. A., Pisco, A. O., Quake, S. R., Salzman, J., et al. (2022). The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896. doi:10.1126/science.abl4896
Clevers, H., Rafelski, S., Elowitz, M., Klein, J., Shendure, C., Trapnell, E., et al. (2017). What is Your conceptual definition of “cell type” in the context of a mature organism? Cell. Syst. 4. 255–259. doi:10.1016/j.cels.2017.03.006
Enquist, B. J., West, G. B., Charnov, E. L., and Brown, J. H. (1999). Allometric scaling of production and life-history variation in vascular plants. Nature 401, 907–911. doi:10.1038/44819
Fisher, R. M., Cornwallis, C. K., and West, S. A. (2013). Group formation, relatedness, and the evolution of multicellularity. Curr. Biol. 23, 1120–1125. doi:10.1016/j.cub.2013.05.004
Fisher, R. M., Shik, J. Z., and Boomsma, J. J. (2020). The evolution of multicellular complexity: The role of relatedness and environmental constraints. Proc. Biol. Sci. 287, 20192963. doi:10.1098/rspb.2019.2963
Kempes, C. P., Koehl, M. A. R., and West, G. B. (2019). The scales that limit: The physical boundaries of evolution. Front. Ecol. Evol. 7, 242. doi:10.3389/fevo.2019.00242
Lähnemann, D., Koster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C., Robinson, M. D., et al. (2020). Eleven grand challenges in single-cell data science. Genome Biol. 21, 31. doi:10.1186/s13059-020-1926-6
Mircea, M., Hochane, M., Fan, X., Chuva de Sousa Lopes, S. M., Garlaschelli, D., and Semrau, S. (2022). Phiclust: A clusterability measure for single-cell transcriptomics reveals phenotypic subpopulations. Genome Biol. 23, 18. doi:10.1186/s13059-021-02590-x
Mircea, M., and Semrau, S. (2021). How a cell decides its own fate: A single-cell view of molecular mechanisms and dynamics of cell-type specification. Biochem. Soc. Trans. 49, 2509–2525. doi:10.1042/BST20210135
Wagner, G. P. (2010). The measurement theory of fitness. Evolution 64, 1358–1376. doi:10.1111/j.1558-5646.2009.00909.x
West, G. B., Brown, J. H., and Enquist, B. J. (1997). A general model for the origin of allometric scaling laws in biology. Science 276, 122–126. doi:10.1126/science.276.5309.122
West, G. B., Brown, J. H., and Enquist, B. J. (1999). The fourth dimension of life: Fractal geometry and allometric scaling of organisms. Science 284, 1677–1679. doi:10.1126/science.284.5420.1677
West, G. B., and Brown, J. H. (2005). The origin of allometric scaling laws in biology from genomes to ecosystems: Towards a quantitative unifying theory of biological structure and organization. J. Exp. Biol. 208, 1575–1592. doi:10.1242/jeb.01589
Keywords: single-cell omics, cell type, allometry, power law, evolutionary fitness
Citation: Semrau S (2022) Why isn’t each cell its own cell type? Diminishing returns of increasing cell type diversity can explain cell type allometry. Front. Cell Dev. Biol. 10:971721. doi: 10.3389/fcell.2022.971721
Received: 17 June 2022; Accepted: 19 August 2022;
Published: 10 October 2022.
Edited by:
James J. Cai, Texas A&M University, United StatesReviewed by:
Arti Ahluwalia, University of Pisa, ItalyAdrianus J. Westgeest, CEFE, France
Jiri Neustupa, Charles University, Czechia
Copyright © 2022 Semrau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Stefan Semrau, c2VtcmF1QHBoeXNpY3MubGVpZGVudW5pdi5ubA==