- 1Instituto de Física, Universidade Federal de Uberlândia, Uberlândia, Brazil
- 2DIMNP, UMR Centre National de la Recherche Scientifique 5235—University of Montpellier, Montpellier, France
We discuss piecewise-deterministic approximations of gene networks dynamics. These approximations capture in a simple way the stochasticity of gene expression and the propagation of expression noise in networks and circuits. By using partial omega expansions, piecewise deterministic approximations can be formally derived from the more commonly used Markov pure jump processes (chemical master equation). We are interested in time dependent multivariate distributions that describe the stochastic dynamics of the gene networks. This problem is difficult even in the simplified framework of piecewise-deterministic processes. We consider three methods to compute these distributions: the direct Monte-Carlo; the numerical integration of the Liouville-master equation; and the push-forward method. This approach is applied to multivariate fluctuations of gene expression, generated by gene circuits. We find that stochastic fluctuations of the proteome and, much less, those of the transcriptome can discriminate between various circuit topologies.
1. Introduction
One of the greatest problems of molecular biology is how single, undifferentiated cells give rise to many different cell types, all being genetically identical yet performing different functions. Since the pioneering work of Jacob and Monod [1] it is known that this multiplicity of behaviors is possible since the protein production depends not only on the existence of a gene but also on the quantities of regulatory molecules that can change with the cell type and environmental conditions. The protein production takes place in two steps [2]. First, during a process called transcription, the genetic information from DNA is copied to the messenger RNA (mRNA). Then, during a process called translation, the mRNA is used as template for protein production by the ribosomes. The amount of transcribed mRNA and translated protein, namely the gene expression, can vary from one cell to another for various reasons. One of these reasons is that the gene expression is not a property of a single gene but is a property of a set of interacting genes; a gene network [3]. As per usual in complex systems, the whole is different from the sum of its parts. In this case, a gene network can have many different stable expression levels that correspond to different network attractors [4]. Another reason for expression variability, is the fact that transcription and translation are stochastic [5–7]. A single cell transcription is often intermittent, periods of strong mRNA production being followed by periods of silence when transcription is stopped [7–10]. The periods of inactivity may correspond to paused RNA polymerase, incompletely formed activation complexes, transient presence of regulatory proteins and complexes on the DNA, or to transitions between open and condensed chromatin [11]. The durations of the periods of activity and inactivity are random. Furthermore, snapshots of the cell population at various times show a heterogeneous picture in which a significant proportion of the cell population express much lower or much higher than the average protein levels [6, 12]. The observed distributions of the gene expression can deviate significantly from a Gaussian by having skewness [12], heavy tails, or multimodality [11].
The gene expression stochasticity can have important biological consequences. In infections and tumors, a part of the pathogen population (bacteria, viruses, cancer) can escape drug treatment when they adopt behaviors very different from the average; for instance they may stop growth [13, 14], remodel their metabolism [15], or stop absorbing, or expel drugs [16]. Latency phenomena [17] can be responsible for resistance to treatment and relapse of the disease once the treatment is stopped.
Gene expression stochasticity is also important in synthetic biology. Synthetic biology devices are supposed to have a well-defined biological function for given combinations of the input conditions. Therefore, in the absence of error-correction, the precision of biological devices depends critically on the degree of stochasticity [18, 19].
For all these reasons, one needs mathematical tools for predicting quantitatively the amplitude and distribution of gene expression fluctuations in gene networks.
Models of stochastic gene networks represent molecular interactions as discrete events (biochemical reactions) separated by random waiting times. This modeling approach, introduced by Delbruck [20] and further developed by Renyi [21] and Bartholomay [22] covers practically all the aspects of gene-gene interactions but is mathematically and computationally challenging. Indeed, the underlying models are continuous time Markov processes with an infinite or extremely large number of states. The corresponding master equation can be solved exactly only in a limited number of cases, corresponding to single genes without or with much simplified feed-back control [23–26]. The Monte-Carlo simulation algorithms (Gillespie algorithm [27]) can be very inefficient for simulating the full process, as the number of individual reactions that have to be simulated for a significant change of the system's state can be tremendously large.
Several types of approximation were used in order to simplify stochastic biochemical networks in order to reduce their simulation time and to facilitate their analysis. These simplifications were possible because the stochastic gene networks have heterogeneous variables and multiple time scales [28]. This heterogeneity comes from the fact that some variables XD such as DNA/regulatory proteins and complexes/polymerase states are discrete and other variables XC such as protein and mRNA copy numbers are continuous. The species dichotomy leads to a partition of the biochemical reactions. Although literature provides a number of different ways for reaction partitioning, the four set partition RD, RC, RDC, RCD (Figure 1) seems to us quite natural. A very similar partitioning was used for rigourously justified approximations in Crudu et al. [29, 30].
Figure 1. Types of variables and important timescales of stochastic biochemical reactions networks. (A) The partition of variables and of the reactions. Dotted line arrows mean that reaction rates depend on the source variables. Continuous line arrow mean that the source reaction acts on the end variable. RD are reactions acting on discrete variables and whose rates depend on discrete variables, RC are reactions acting on continuous variables and whose rates depend on continuous variables. RCD, RDC are reactions acting on discrete and on continuous variables, respectively, and have rates depending on both discrete and continuous variables. (B) Typical trajectories and time scales of continuous and discrete variables.
Two main timescales have to be taken into account in order to find the appropriate approximation. As shown in the Figure 1, the discrete variables switch between a number of discrete states. The characteristic time of this process was called switching time, τS [28]. The trajectories of continuous variables are smooth only in the average, but for these variables, the average is a good approximation. The characteristic time of fluctuations of continuous species around their average was named discreteness time, τD [28]. The discreteness time scales like 1/N where N is the copy number of the continuous species.
There are two main classes of approximated models (Figure 2):
• Deterministic (ordinary differential equations, ODEs) or diffusion (Fokker-Planck) approximations can be applied when the transitions between discrete states are fast and the continuous species have large copy numbers, thus when τS, τD are both small. Then one obtains the deterministic or diffusion limit by applying the law of large numbers or the central limit theorem to Markov processes [31, 32]. This result can be obtained heuristically by the Ω expansion, well known in physics [33]. If the discreteness time is larger than the switching time, similar approximated models can be obtained by using averaging [29, 30]. In both cases the discrete states are homogenized and the approximated model has only continuous variables.
• Piecewise-deterministic processes (PDP) or hybrid diffusion approximations can be obtained when the discreteness time is small, but τS >> τD. In these approximations there are two types of variables. The dynamics of the continuous variables is described by ODE or stochastic differential equations (SDE), whereas the discrete variables dynamics is described by Markov chains. These approximations were first obtained heuristically by using partial Ω expansions of the master equation [29, 34, 35] and then justified rigorously by using generators and measure theory in Crudu et al. [30]. Finally, diffusion approximation was applied to PDP in the limit when the switching time is small, to obtain again deterministic and Fokker-Planck approximations [28].
Figure 2. Different approximations of stochastic biochemical reactions networks. The “exact” model is a completely discrete Markov process described by the chemical master equation. Various approximations can be obtained as asymptotic limits when some parameter is very small or very large from the “exact” model or from approximations.
In this paper we consider the situation when the swiching time is relatively large. In biology, this corresponds to the so-called “random telegraph” [36], “bursting” [8, 12], or “multi-scale bursting” [10] fluctuations. Simply speaking there are some variables (DNA, regulatory complexes, and/or RNA polymerase states) that have ON/OFF or multiple state Markov chain dynamics. The discrete variables control the ODE dynamics of the continuous variables (mRNA, proteins). The underlying approximation is piecewise-deterministic, because the ODEs change each time that the discrete variables perform a transition (see Figure 1).
In our PDP models of gene networks each gene promoter is described as a finite state Markov chain. The transition rates between states of the promoter depend on the expression levels of the regulating genes. The promoter triggers synthesis of gene products with intensities depending on its state. We represent the state of a gene network as a random vector whose components indicate the promoter state and the copy numbers of all the mRNAs and proteins in the network. The dynamics of the gene network will be represented as the time-dependent multivariate distribution of this random vector. These distributions contain information about the randomness of each gene but also about correlation among genes and promoter states. We present three methods to compute time dependent multivariate distributions for such models: the direct simulation of the PDP process (the Monte-Carlo method); the numerical solutions of the Liouville-master equation, and the push-forward method. The first method is stochastic, whereas the last two are entirely deterministic (do not use random number generators). The PDP Monte-Carlo method combines the numerical simulation of the Markov chain of promoter states with symbolic solutions of the deterministic ODEs and is much faster than the direct Gillespie method [27].
This paper is structured as follows: In section 2 we recall the justification of piecewise-deterministic approximation using the partial Ω expansion. In section 3 we introduce a class of piecewise-deterministic models covering gene networks. In section 4.1 we discuss the Monte-Carlo methods for piecewise-deterministic models. In section 4.2 we discuss the Liouville-master equation approach. In section 4.3 we discuss the push-forward method. In section 5 we briefly discuss possible applications of these methods to extracting information from mRNAs' and proteins' spectra of fluctuations.
2. Partial Omega-Expansion and the Liouville-Master Equation
The dynamics of stochastic biochemical networks can be described by a pure jump Markov process [20–22] (For a general presentation of Markov jump processes see Gikhman and Skorokhod [37]). The state of the network is a vector X ∈ ℕn whose components represent copy numbers of molecules of various species. Each biochemical reaction is defined by a stoichiometric (jump vector) where is the set of biochemical reactions in the model. The occurrence of the reaction i is represented as a jump in the system's state X → X + γi. Finite, discrete states of gene promoters can be represented by extending the meaning of species to include “places” with finite values of the copy numbers. For instance, on/off gene promoters can be represented by using two places Pon and Poff with two possible occupancies 0 or 1. The transitions from OFF to ON and back can be represented by a reversible reaction that consumes a particle on the place Poff and produces a particle on the place Pon.
In the Markov pure jump representation the dynamics of the system is a series of jumps separated by exponentially distributed waiting times [35, 37]. The number of reactions of the type i occurring in the average per unit time is given by the propensity (or rate) function X → Vi(X; μ), where μ are kinetic parameters. The time between successive reactions is exponentially distributed with an average and the next reaction is i with probability . We are interested in the multivariate distribution p(X, t) representing the probability to be in state X at time t. This satisfies the time dependent master equation:
The van Kampen Ω (system size) expansion [33] or, equivalently, the central limit theorem [31, 38] lead, in the first order, to deterministic (ODE) approximations and to diffusion (Fokker Planck) approximations, in the second order. The partial omega expansion consists in applying the Ω expansion only to species that are produced in sufficiently large copy numbers [29, 30]. The large copy numbers species denoted XC are called continuous, because the biochemical reactions change their values gradually, by the accumulation of a large number of small steps. Other species are present only in a few copies per cell (these include the “places” describing promoter states). We denote these species XD and call them discrete. This leads to a decomposition of the state vector as X = (XD, XC) and of all stoichiometric vectors as , corresponding to discrete and continuous species coordinates. The interactions among discrete and continuous species are suitably described by a partition of the reactions in four sets = D ∪ DC ∪ CD ∪ C [28–30].
The reactions D act on XD (the corresponding γi have non-zero coordinates on discrete species, ) and have propensities depending on XD only. The reactions C act on XC (the corresponding γi have non-zero coordinates on continuous species, ) and have propensities depending on XC only. The reactions DC, CD act on XC and XD, respectively, and their propensities depend on both XD and XC.
In this paper we consider gene network models. For each gene, we model the transitions between promoter states, as well as other processes such as transcription, translation, protein folding, and protein and mRNA degradation. We will consider that the mRNA molecules and proteins are in sufficiently large copy numbers to justify continuous approximations. The only discrete variables are in this case the promoter states. The set D contains transitions between discrete promoter states whose rates do not depend on regulatory proteins. The set CD contains transitions between promoter states whose rates depend on concentrations of regulatory proteins. The set C contains translation, maturation (folding), degradation reactions. The set DC contains transcription initiation reactions that depend on the promoter state.
We further consider that the copy numbers of continuous species XC and the propensities of reactions in the sets C and DC are “extensive,” in other words, scale with the system size Ω, XC = Ωxc, Vi = Ωvi(xc, XD), for i ∈ DC ∪ C, whereas the propensities of reactions in D and CD are considered “intensive” and do not scale with Ω. The Ω dependence is not only a useful mathematical tool, but has also a biological meaning. In proliferating cells, the protein concentrations are important for biochemical reactions and should be maintained by synthesis reactions. This is only possible if the propensities of synthesis reactions (including mRNA synthesis) scale with the size. Rates of monomolecular reactions consuming continuous reactants (for instance, degradation reactions) are proportional to the reactant copy number that scale with the size. Rates of switching reactions between discrete promoter states do not scale with size, unless they are proportional to copy numbers of activator or repressor proteins. For a more complete discussion of these scaling relations we refer to Crudu et al. [29].
In rescaled variables, the master equation reads
Using the first order Taylor series expansion we obtain the Liouville-master equation
The Liouville-master equation has been used in various fields; in statistical physics of quantum systems [39–41] or with a different name in statistics and operations research [42]. It describes the time-dependent distribution of a piecewise-deterministic model. Indeed, conditionally on XD (considering that the discrete variables are fixed) the continuous variables xc satisfy deterministic, ODEs:
The discrete variables follow a pure jump Markov dynamics defined by the stoichiometric vectors and the propensities Vi(XD, xc; μ), i ∈ D ∪ CD. The noise in the system is produced by the stochastic transitions of the discrete variables. The probability distribution of the continuous variables is advected (transported by the flow defined by the ODEs, see Figure 3) by the deterministic flow Φ(XD, xc; μ) in the continuous variables space; this explains the advection term in the Liouville-master equation.
Figure 3. Model of a single gene controlled by a two state (ON/OFF) promoter. In the graphical representation of the model, the arrow from mRNA to mRNA indicates that mRNA is recovered after the transcription reaction. Between two successive jumps of the discrete variables, a probability distribution of the continuous variables x (protein) and y (mRNA) represented as black dots is continuously advected by one of the two possible deterministic flows to a distribution represented by red dots. The s = 1 flow pushes the system toward the high expression attractor x = (bk1)/(aρ), y = k1ρ and the s = 0 flow pushes the system toward the low expression attractor x = (bk0)/(aρ), y = k0ρ, where k0 < < k1. The system chooses alternately and stochastically between these two flows.
3. PDP Models of Gene Circuits
In this section we introduce a family of PDP models that can be used to represent gene networks. They are a special, simplified case of the class of models defined by (2.3). The main simplification is that the propensities of reactions in DC depend on XD and do not depend on xC.
As an example, we consider a gene circuit with dichotomous noise. This model is made of ng genes, each one controlled by a promoter with 2 states; ON and OFF. The continuous variables are xi and yi; the protein and mRNA level for each gene i, respectively. The discrete variables are two values variables si ∈ {0, 1} representing the promoter states (0 stands for OFF and 1 stands for ON). In this model, the set DC consists of transcription initiation reactions. Their rates depend on the promoters states (ON or OFF) but do not depend on the protein or mRNA levels. The set CD ∪ D consists of reversible transitions between the states ON and OFF. The corresponding transition rates are constant when the gene i is constitutive; namely, these rates are fi from ON to OFF and hi from OFF to ON. When the gene i is activated by the gene j the transition rate from OFF to ON is fixj, whereas when the gene i is repressed by the gene j the transition rate from ON to OFF is hixj. The discrete variables' dynamics are thus a Markov chain with the state set . It is convenient to relabel the states from 1 to using the lexicographic order on . For instance, two gene circuits have, in order, the states 1:(0, 0), 2:(0, 1), 3:(1, 0), 4:(1, 1). Also, instead of using reaction propensities, we equivalently provide a transition rate matrix S whose elements are the transition probabilities per time unit between states. For instance, for a two gene circuit where the first gene is constitutive and the second gene is activated by the first, we have:
The mRNA and protein variables follow ODE dynamics
where ρi, bi, ai, i ∈ [1, ng] represent mRNA degradation, translation, and protein degradation rates for the gene i, respectively, and s = (s1, s2, …, sng) is the state of the Markov chain, such that
For the sake of illustration let us consider the simple model of a single constitutive gene controlled by a two state (ON/OFF) promoter. We denote the states of the promoter by 1 and 0, respectively. The transition rate from 0 to 1 is f and from 1 to 0 is h. The protein and mRNA concentrations are x and y, respectively. The transcription initiation rate in the state 1 is k1 and in the state 0 is k0 < < k1. The translation rate is b. The mRNA and protein degradation rates are ρ and a, respectively. The master-Liouville equation reads
The protein and mRNA concentrations follow the ODEs
The probability distribution of the promoter state s results from the dynamics of the two state Markov chain
where p0 = ℙ[s = 0] = ∫p(0, x, y)dxdy, p1 = ℙ[s = 1] = ∫p(1, x, y)dxdy.
We also define the asymptotic occupancy probabilities and , representing the probabilities, at steady state, that the promoter state is OFF and ON, respectively.
The single constitutive gene model and the advection fluxes of the Liouville-master equation are illustrated in Figure 3. More complex, two gene circuits models are represented in Figure 4 and their Liouville-master equations are given in the Appendix 1.
Figure 4. Two gene circuits models. Only the discrete (Markov chain) part of the dynamics is represented. These models have 4 discrete states as each one of the two promoters can be ON or OFF. The transition rates between states are either constant or functions of the levels of proteins x1 and x2. We consider that transcription, translation, and degradation parameters k0, k1, ρ, a, b are the same for the two genes.
4. The Numeric Methods for Solving the Liouville-Master Equation
In this section we compare several numerical methods for solving the Liouville-master equation in the context of gene networks models. In order to quantify the relative difference between methods we use the L1 distance between distributions. More precisely, if p(x) and are probability density functions to be compared, the distance between distributions is
4.1. The PDP Monte-Carlo Method
The PDP Monte-Carlo method is based on the direct simulation of the PDP process. A simple algorithm has been proposed in Crudu et al. [29]. For the sake of completeness we recall here the main steps of this algorithm.
(1) Set the initial state condition s = s0, x = x0, y = y0, t = 0.
(2) Generate a random variable u uniformly distributed in [0, 1].
(3) Integrate the system of differential equations obtained by adding to (3.2) the equation for the survival function F of the waiting time to the next Markov chain transition
between 0 and τ with the stopping condition
Here λ(s, x) is the sum of transition probabilities in the row corresponding to the state s of the transition rates matrix S.
(4) Generate a second uniform variable v and use it to find the next Markov chain state snext. The decision for the next discrete state is made in the same way as in the Gillespie algorithm [27].
(5) Change the system state s = snext, x = x(τ), y = y(τ), and the time t = t + τ.
(6) Reiterate the system from (2) with the new state until a time tmax previously defined is reached.
(7) Store samples of x and y at various predefined times.
(8) Reiterate the system from (1) a large number of times. Perform distribution estimates using the resulting statistical ensemble.
A major improvement with respect to Crudu et al. [29] can be obtained by using symbolic solutions of (4.2) (see the Appendix 2 for the symbolic solutions) and solving numerically the nonlinear stopping condition (4.3).
4.2. The Finite Difference (FD) Liouville-Master Method
The Liouville-master equation is a system of linear, partial differential equations (PDEs) for the probability distributions of the mRNAs and proteins for various states of the gene promoters. The number of PDEs is equal to the number of distinct discrete states of the gene promoters, i.e., it is , where ng is the number of genes in the network.
These equations have to be integrated with boundary conditions in order to control possible mass accumulation on the integration domain boundaries. The boundary conditions are obtained by setting to zero the advection fluxes pointing toward the boundary. For the one-gene model (3.3), the integration domain is and the boundary conditions read
One can note the limit version of the boundary conditions allow divergence of p0, p1 on the boundaries y = k0/ρ or y = k1/ρ, respectively. For boundaries different than y = k0/ρ or y = k1/ρ, advection fluxes point toward the interior of the integration domain and one can simply impose zero boundary conditions p0 = p1 = 0.
As noticed by Marc Kac in a very instructive paper about a piecewise-deterministic random walk [43], in contrast to the Fokker-Planck equations that describe “normal” random-walk diffusion and are parabolic, the Liouville-master equation has hyperbolic nature. General properties of hyperbolic equations, such as finite propagation velocity of perturbations and existence of sharp fronts apply to our equations as well. For our problem, the front discontinuities occur at the domain boundaries and they are handled by the boundary conditions (4.4). Hyperbolicity properties are mainly visible at slow switching and should disappear at fast switching when the Liouville-master equation can be well approximated by a Fokker-Planck equation [28].
In this paper we have used a finite-difference predictor-corrector scheme [44] to compute the solution of the Liouville-master equation. In Figure 5 we compare the distributions for mRNA and proteins resulting from the constitutive gene model (3.3) with the Monte-Carlo simulation of the model. The comparison is quantitative and uses the distance defined by (4.1). In all our computations, the asymptotic occupancy probability is one half. For slow switching, the mRNA distribution is bimodal, with discontinuities at the modal values k0/ρ and k1/ρ values, where the probability density diverges on one side and is zero on the other side. The bimodality resulting from slow switching is well understood and signalled in many other places in the literature (see for instance [45, 46]). We can emphasize that the discontinuity of the solution is a consequence of the hyperbolicity of advection fluxes. A parabolic diffusion flux would not be able to build up such discontinuities and this can be seen in the fast switching distributions that are continuous and unimodal. Interestingly, the protein distributions are unimodal in both cases: slow and fast. A unique discontinuity can be observed at short times in the protein distribution, for a slowly switching promoter.
Figure 5. Histograms of mRNA (A,B) and protein (C,D) copy numbers for a single gene, produced by the Monte-Carlo method (green lines) and by the finite difference Liouville-master equation method (black lines). The initial data has half-normal bivariate distribution whose mode is x1 = x2 = y1 = y2 = 0. The gene parameters are ρ = 1, k0 = 4, k1 = 40, a = 1/5, b = 4, p0 = 0.5, and ϵ = 0.5, ϵ = 5.5, for a slow (A,C), and a fast (B,D) switching gene, respectively. The comparison is quantified by the distance d defined by (4.1).
The bivariate mRNA vs. protein distributions are shown at steady state in Figure 6. The mRNA is positively correlated to the protein as it should be. Interestingly, for slow switching, cells close to half protein maximum expression have strongly bimodal mRNA expression, either vanishing or maximum. This rather counterintuitive behavior results from the difference in lifetime between the mRNA and protein molecules. The mRNA molecule has a short life and can, for slow switching, oscillate between very low and maximum values. The protein is much more stable and integrates these oscillations over a large lifetime. This explains why a cell with half protein maximum copy number can have extremely variable mRNA copy numbers.
Figure 6. Steady state distributions of protein and mRNA copy numbers for a single gene, produced by the Monte-Carlo method and by the finite difference (FD) Liouville-master equation method. The parameters are those used in Figure 5. The probability to color map relation is linear.
4.3. The Push-Forward Method
To introduce the method let us consider the example of a two gene circuit in which the first gene is constitutive and activates the second gene. In other words, the first gene is a transcription factor of the second one.
The push-forward method for gene networks was first introduced in Innocentini et al. [47]. This method combines Master Equations (MEs) and Random Differential Equations (RDEs). The ME provides the time evolution of the probability distribution of discrete variables. With respect to Innocentini et al. [47] where the discrete variables were the promoter ON/OFF states and the mRNA copy numbers, we consider that the discrete variables are solely the promoter's ON/OFF states. The RDEs are differential equations for the mRNAs and proteins in which the promoter states are considered random parameters. The coupling of ME and RDE is another equivalent way to define the piecewise-deterministic process. When the ME is not dependent on the RDE our models are of the same type as those discussed by Mark Kac in relation to the telegrapher's equation [43].
Let us introduce our model beginning with the MEs describing the dynamics of the first switch,
where, is the probability occupancy vector whose entries are the probabilities to find the first switch in the OFF state (P0(t)) or in the ON state (P1(t)). The infinitesimal stochastic matrix H1 is given by:
This is a basic telegraph process where the rates f1 and h1 are time independent and control the reaction OFF (state 0) to ON (state 1) and ON (state 1) to OFF (state 0), respectively. The equation describing mRNA dynamics is a Random Differential Equation (RDE) given by
where y1 is a random variable representing the copy number of mRNA in the cell coming from the first gene. The random variable s1(t) follows the switch statistics, meaning that with probability P1(t), s1(t) = 1 at time t and s1(t) = 0 with probability P0(t), again at time t. The production rate of mRNA is a function of the random variable s1(t) following
where K1 is the highest level of mRNA production and K0 is the basal one. The third equation describing the activity of the first gene is for the random variable representing the protein density associated to it;
where α is protein degradation rate and β is the translation rate. The last equation for the coupled gene model is the one governing the probability occupancy of the second gene,
where encodes, in its entries, the information about the probability to find the second gene ON (Q1(t)) or OFF (Q0(t)). The matrix H2(t) is given by
In the model at hand, the main source of stochasticity is the switching ON and OFF of the gene. This noise is transmitted to mRNA synthesis process through the rate k(s1(t)) which is a function of a random variable (s1(t)) and, so, a random variable itself. The first step of the push-forward method is to compute the time dependent distribution probability of mRNA molecules y1(t) (which is perturbed by the random variable s1(t)) once the probability distribution of the perturbation is known. To do so, we begin by presenting the solutions of Equation (4.5),
where encodes the initial configuration (given at t = t0) of the switch, and the matrices are
Explicitly, the solutions are given as
where p0 = h/(f + h) and p1 = f/(f + h) are the asymptotic occupancy probabilities to find the gene OFF and ON, respectively. Going on, we present the formal solution of the RDE governing mRNA dynamics,
where we have rescaled time t by the mRNA degradation rate and introduced the new time parameter, τ = tρ, and also the dimensionless parameters k0 = K0/ρ and k1 = K1/ρ. Note that the integral (4.14) is a basic Riemann integral such that, if we consider a sufficiently fine partition [τ0, τ1, …, τN−1, τN] of the integral interval [τ0, τ], where τN = τ, we can assume that is constant inside each specific partition [τj, τj + 1] and the integral in (4.14) is approximated by
It is worthwhile to remember that s1(τ) is a jumping process assuming, at each instant of time, just one of the values, 0 or 1, with probability P0(τ) or P1(τ), respectively. Now the solution of y1(τ) is given as a function of sequences of s1(τj), with j = 0, …, N−1. These sequences are strings of zeros and ones and we must consider all of them. For instance, if the number of partitions is N, we will have 2(N−1) possible sequences and each one will lead to a different value of y1(τ). The remaining task is to assign the correct weight for each one of these values of y1(τ). This is achieved by using the occurrence probability of each possible string of zeros and ones defining the values of y1(τ). The occurrence probability is given by the joint distribution probability of having s1(τj) = 0 or 1 at time τj for j = 0, …, N−1 which is
where Ps1(τ0)(τ0) is the occupancy probability to find the gene in state s1(τ0) at time τ0, as per (4.13), and Prc(…|…) stands for the conditional probability for the telegraph process. For instance,
encodes the transition probability from state s1(τN−2) to state s1(τN−1) during the time interval τN−1 − τN−2 knowing that at τN−2 the system was with probability one in state s1(τN−2). The conditional probabilities for the telegraph process, which is a Markovian process, are easily obtained by chosen the appropriate initial configuration:
where the new parameter ϵ = (f + h)/ρ measures the flexibility of the switch. With the conditional probabilities and (4.15) at hand, we can calculate the time dependent probability distribution for the mRNA density. We have considered two examples: one is the fast switch regime (ϵ > 1) and, the other, slow switching (ϵ < 1).
Protein distribution in time can be obtained in the same fashion as the one for mRNA. The general solution for protein density is
where the rescaled parameters are given by a = α/ρ and b = β/ρ. The push-forward method can be applied to obtain the time dependent distribution probability for protein density, in an analogous way as for mRNA. The integral that must be partitioned is that over τ′, in the interval [τ0, τ],
where, we have used the definition k(s1(τ)) = k0(1 − s1(τ)) + k1s1(τ) to simplify the notation. As before, we have illustrated our method by calculating the protein density for the same two regimes of switch flexibility.
To analyze the influence of the first gene on the second one we have assumed that the action of the first gene is to activate the second [see (4.10)]. To do so, instead of solving the RDE describing the activity of the second gene [it is an RDE because the perturbation x1(τ) is a random variable] we have analyzed the mean value of the occupancy probability of the second gene whose dynamics is given by
and 〈Q1(τ)〉 = 1−〈Q0(τ)〉. The general solution for 〈Q0(τ)〉 is given by
In Appendix 3 we show, in detail, how to obtain the exact functional shape of 〈x1(τ)〉. Nevertheless, its structure is , and, because of this, the integral in (4.21) cannot be evaluated analytically and a numerical evaluation must be performed. This will also be the case for the conditional probabilities that will be expressed as
where we have set 〈Qi(τj−1)〉 = s2(τj−1) (with i = 0 or 1) expressing the fact that at the instant of time τj−1 the gene 2 is in state i with probability s2(τj−1) (which is 0 or 1), as we have done for the gene 1. The difference is that for gene 1 we have closed solutions for the occupancy probabilities and the conditional probabilities were derived analytically from these solutions in (4.17). However, here, as said before, we must perform the integrals in (4.22) numerically.
With (4.21) and (4.22) at hand we are in position to obtain the distributions for mRNA and proteins associated with gene 2 in the same fashion as we did for the gene 1. The only difference is that the mRNA and protein copy numbers will be obtained using (4.15) and (4.19) just by changing the index 1 for 2 (s1(τ) → s2(τ), y1(τ) → y2(τ), and x1(τ) → x2(τ)). A last comment regards the parameter space: As we did for the gene 1, we have redefined the parameter space of gene 2 and introduce the more biological parameters; the asymptotic occupancy probabilities (q0, q1) and the flexibility parameter (σ). The new parameters are expressed in terms of the old ones as
In Figure 7 we compare the distributions for mRNA and proteins associated with gene 2 with the direct simulation of the model. We have done that in four distinct situations when the first and the second genes are fast and slow. For all the cases the first and second gene are OFF at τ = 0 and both have asymptotic occupancy probabilities equal to 1/2. The comparison is quantitative and uses the distance defined by (4.1).
Figure 7. Histograms of mRNA (A,B) and protein (C,D) copy numbers for the second gene in the circuit M1, produced by the Monte-Carlo method (green lines) and by the push-forward method (black lines). The initial data is x1 = x2 = y1 = y2 = 0 and the circuit parameters are ρ = 1, k0 = 4, k1 = 40, a = 1/5, b = 4, p0 = 0.5, for both genes, and ϵ = 0.5, ϵ = 5.5, for slow (A,C), and fast (B,D) switching genes, respectively. The comparison is quantified by the distance d defined by (4.1).
5. Testing the Propensity of mRNA and Protein Intrinsic Fluctuations to Reveal Gene-Gene Interactions
In this paper we used mathematical models to predict mRNA and protein “intrinsic” fluctuations (by “intrinsic” we understand fluctuations generated by the stochastic gene network dynamics). An important question in this context is if “intrinsic” gene expression fluctuations can be used to reverse engineer gene networks. “Extrinsic” fluctuations (by “extrinsic” we understand perturbations of gene networks coming from their environment) of the transcriptome were extensively used in the past to reconstruct gene networks using correlation or mutual information (for a popular method see [48]). A few reverse-engineering studies obtained gene regulation parameters from intrinsic gene expression fluctuations [12, 49]. Quantifying intrinsic fluctuations requires single cell measurements of several genes. A variety of technologies are now ready to take this challenge—single cell sequencing [50], single-cell RNA microscopy [10], various versions of time-lapse microscopy [51], fluorescence correlation microscopy [12]. It is therefore important to test the propensity of expression fluctuations to discriminate between various gene network architectures.
The predictions of our PDP models are provided as multivariate distributions of mRNAs and proteins copy numbers for one or several genes. These predictions can be directly compared with results obtained with molecular biology and biophysics experimental methods.
First, one would like to test if the differences between distributions predicted with various gene circuit models are significant and therefore can be used to discriminate between gene circuit models. To this end, we generated bivariate proteins and mRNA distributions for five different gene circuits like in Figure 4. The visual inspection of Figure 8, suggests that mRNA distributions resulting from five different gene circuits, with gene-gene interactions that differ by their signs, are very similar in the same regime of switching (fast or slow) of the genes. The mRNA distributions discriminate between model parameters (fast or slow switching) but do not discriminate between circuit architectures. The protein bivariate distributions are shown in Figure 9. They differ strongly from mRNA distributions and discriminate between both parameters and architectures.
Figure 8. Steady state bivariate histograms of mRNA copy numbers from two interacting genes in circuits of different types and for four switching regimes of the promoters (SS, FF, SF, FS, where SF means that the first gene is slowly switching whereas the second is switching fastly), obtained with the Monte-Carlo method. The individual gene parameters are those used in Figure 5; f and h constants in fxi or hxi terms are chosen such that mean mRNA and protein are the same in regulated and constitutive genes. The probability to color map relation is logarithmic.
Figure 9. Steady state bivariate histograms of protein copy numbers from two interacting genes in circuits of different types and for four switching regimes of the promoters, obtained with the Monte-Carlo method. The individual gene parameters are those used in Figure 5; f and h constants in fxi or hxi terms are chosen such that mean mRNA and protein are the same in regulated and constitutive genes. The probability to color map relation is logarithmic.
As visual colormap differences may be judged subjective, we developed a quantitative test for the discriminant power. This test is based on the distance d defined by (4.1). We have computed d pairwise, for mRNA and for protein distributions produced from different gene circuits at steady state. In order to test if these distances are significantly large we compared them to distribution of distances between random samples generated by the same gene circuit model for a fixed number of cells. The result of this comparison is shown in Figure 10 for the two gene circuits G1→G2 and G1⊣G2 that differ by the sign of the interaction; one can notice that the protein fluctuation based distance is significant, whereas the mRNA fluctuation distance is not, both for slow/slow and fast/fast genes.
Figure 10. Testing the capacity of mRNA and protein fluctuations to reveal gene-gene interactions. Upper row: the distance between distributions generated by activating and repressive circuit (red line) are compared to the distances between random samples (Nc = 1, 000) generated by the same repressive circuit (blue smoothed histogram). Middle and lower row: the Bravais-Pearson correlation coefficient ρ is computed from random samples containing increasing numbers of cells and the corresponding p-value (p is the upper tail probability and correlation is significant when p is small) is represented vs. sample size. The parameters of the simulations are those used in Figure 9.
We have also tested the significance of the correlation computed from bivariate mRNA or protein distributions. A simple gene reconstruction method is to consider that genes interact if the correlation coefficient is significantly different from zero. We have computed the Bravais-Pearson correlation coefficient from bivariate mRNA and protein samples generated with our PDP model, at steady state and for increasing numbers of cells Nc. For both models G1→G2 and G1⊣G2 a significant (upper tail probability p < 5%) protein-protein correlation is obtained for moderate cell populations (Nc > 100 for p < 5%, see Figure 10). In order to obtain significant mRNA/mRNA correlation one has to use very large numbers of cells (Nc > 1, 000 for p < 5%, see Figure 10). This is possible for single cell sequencing and flow cytometry (with the drawback of the lack of precision in estimating the mRNA copy numbers) but is very difficult for techniques such as MS2 tagging microscopy, or time-lapse microscopy.
6. Discussion
We have discussed three methods to compute time-dependent distributions of mRNA and protein copy numbers generated by gene networks. All the three methods are much faster than the Gillespie exact chemical master equation. The finite difference Liouville-master equation method is precise and fast for small models. Simple (non-adaptive) finite difference schemes, however, are demanding in terms of space and time resolution leading to computer memory limitations. In future implementations of the Liouville-master equation method we plan to use spectral methods for bypassing these limitations. The push-forward method is not as precise as the Liouville-master equation, but it is much more stable, even at low resolution. The differences between execution times of various methods are illustrated in the Table 1 indicating that the push-forward method is the fastest. The performance of the push-forward method results from the reduced cardinality of the discrete phase space (2 states for one gene, 4 states for 2 genes) which is an improvement with respect to the initial version in Innocentini et al. [47]. Further improvements, lifting the mean-field approximation for gene coupling will be presented elsewhere.
Piecewise-deterministic models are valuable tools for understanding stochastic gene expression in a wide spectrum of regimes, covering both slow and fast switching. The source of stochasticity in such models are the random transitions between discrete promoter states; a phenomenon usually associated with transcriptional bursting. In this paper we have only discussed dichotomous noise (ON/OFF promoters), however, as seen in section 2, our methods work also for promoters with more than two states.
As application of our numerical methods we tested the capacity of mRNA and protein copy numbers fluctuations to unravel differences between gene circuit architectures. We showed that protein fluctuations are sensitive to differences of architecture and that protein-protein correlation reveals gene-gene interactions for moderate cell population sizes (100 cells). In contrast, mRNA fluctuations are much less sensible to differences in circuit architecture and mRNA-mRNA correlation is small, even for interacting genes. This reinforces the already well established conclusion that proteome contains much more information than the transcriptome. In the past we used the spectrum of protein copy number fluctuations to extract information about promoter repression mechanisms [12]. The difference in behavior of the mRNA and protein fluctuations can be explained by the fact that the mRNA half life is usually much shorter than the protein half life. Gene-gene interactions are mediated by proteins that slowly modulate the gene switching times. Proteins follow these slow modulations, which results in significant protein-protein correlation. mRNA dynamics are permanently submitted to the faster (uncorrelated) gene switching, which explains the lower correlation of steady state mRNA fluctuations. This suggests that reconstruction of gene networks from mRNA intrinsic fluctuations is submitted to severe limitations. More general results concerning these limitations will be presented elsewhere.
Author Contributions
OR and GI conceived the research. OR implemented the Monte-Carlo method. GI implemented the push-forward method. AH implemented the Liouville-master method. All authors were involved in writing the paper. All the authors have read and approved the final manuscript.
Funding
OR thanks MI CNRS and LABEX Epigenmed for support. GI Thanks PNPD/CAPES for financial support.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphy.2018.00046/full#supplementary-material
References
1. Jacob F, Monod J. Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol. (1961) 3:318–56.
3. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. (1969) 22:437–67.
4. Thomas R, Kaufman M. Multistationarity, the basis of cell differentiation and memory. I. Structural conditions of multistationarity and other nontrivial behavior. Chaos (2001) 11:170–9. doi: 10.1063/1.1350439
5. McAdams HH, Arkin A. Stochastic mechanisms in gene expression. Proc Natl Acad Sci USA. (1997) 94:814–9.
6. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science (2002) 297:1183–6. doi: 10.1126/science.1070919
7. Thattai M, Van Oudenaarden A. Stochastic gene expression in fluctuating environments. Genetics (2004) 167:523–30. doi: 10.1534/genetics.167.1.523
8. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. (2006) 4:e309. doi: 10.1371/journal.pbio.0040309
9. Cai L, Friedman N, Xie XS. Stochastic protein expression in individual cells at the single molecule level. Nature (2006) 440:358–62. doi: 10.1038/nature04599
10. Tantale K, Mueller F, Kozulic-Pirher A, Lesne A, Victor JM, Robert MC, etal. A single-molecule view of transcription reveals convoys of RNA polymerases and multi-scale bursting. Nat Commun. (2016) 7:12248. doi: 10.1038/ncomms12248
11. Nicolas D, Phillips NE, Naef F. What shapes eukaryotic transcriptional bursting? Mol Biosyst. (2017) 13:1280–90. doi: 10.1039/c7mb00154a
12. Ferguson ML, Le Coq D, Jules M, Aymerich S, Radulescu O, Declerck N, etal. Reconciling molecular regulatory mechanisms with noise patterns of bacterial metabolic promoters in induced and repressed states. Proc Natl Acad Sci USA. (2012) 109:155–60. doi: 10.1073/pnas.1110541108
13. Dean M, Fojo T, Bates S. Tumour stem cells and drug resistance. Nat Rev Cancer (2005) 5:275–84. doi: 10.1038/nrc1590
14. Balaban NQ, Merrin J, Chait R, Kowalik L, Leibler S. Bacterial persistence as a phenotypic switch. Science (2004) 305:1622–5. doi: 10.1126/science.1099390
15. Holohan C, Van Schaeybroeck S, Longley DB, Johnston PG. Cancer drug resistance: an evolving paradigm. Nat Rev Cancer (2013) 13:714–26. doi: 10.1038/nrc3599
16. Hirschmann-Jax C, Foster AE, Wulf GG, Nuchtern JG, Jax TW, Gobel U, etal. A distinct “side population” of cells with high drug efflux capacity in human tumor cells. Proc Natl Acad Sci USA. (2004) 101:14228–33. doi: 10.1073/pnas.0400067101
17. Razooky BS, Pai A, Aull K, Rouzine IM, Weinberger LS. A hardwired HIV latency program. Cell (2015) 160:990–1001. doi: 10.1016/j.cell.2015.02.009
18. Raj A, van Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell (2008) 135:216–26. doi: 10.1016/j.cell.2008.09.050
19. Mutalik VK, Guimaraes JC, Cambray G, Lam C, Christoffersen MJ, Mai QA, etal. Precise and reliable gene expression via standard transcription and translation initiation elements. Nat Methods (2013) 10:354–60. doi: 10.1038/nmeth.2404
21. Rényi A. Betrachtung chemischer Reaktionen mit Hilfe der Theorie der stochastichen Prozesse. Magyar TudAkadAlkalmMatIntKözl (1954) 2:93–101.
22. Bartholomay AF. A Stochastic Approach to Chemical Reaction Kinetics. Ph.D. thesis, Harvard University (1957).
23. Peccoud J, Ycart B. Markovian modeling of gene-product synthesis. Theor Popul Biol. (1995) 48:222–34.
24. Hornos JEM, Schultz D, Innocentini GCP, Wang J, Walczak AM, Onuchic JN, etal. Self-regulating gene: an exact solution. Phys Rev E Stat Nonlin Soft Matter Phys. (2005) 72:051907. doi: 10.1103/PhysRevE.72.051907
25. Innocentini GCP, Hornos JEM. Modeling stochastic gene expression under repression. J Math Biol. (2007) 55:413–31. doi: 10.1007/s00285-007-0090-x
26. Ramos AF, Innocentini G, Hornos JEM. Exact time-dependent solutions for a self-regulating gene. Phys Rev E Stat Nonlin Soft Matter Phys. (2011) 83:062902. doi: 10.1103/PhysRevE.83.062902
27. Gillespie DT. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys. (1976) 22:403–34. doi: 10.1016/0021-9991(76)90041-3
28. Radulescu O, Innocentini GC, Hornos JEM. Relating network rigidity, time scale hierarchies, and expression noise in gene networks. Phys Rev E Stat Nonlin Soft Matter Phys. (2012) 85:041919. doi: 10.1103/PhysRevE.85.041919
29. Crudu A, Debussche A, Radulescu O. Hybrid stochastic simplifications for multiscale gene networks. BMC Syst Biol. (2009) 3:89. doi: 10.1186/1752-0509-3-89
30. Crudu A, Debussche A, Muller A, Radulescu O. Convergence of stochastic gene networks to hybrid piecewise deterministic processes. Ann Appl Probab. (2012) 22:1822–59. doi: 10.1214/11-AAP814
31. Kurtz TG. Solutions of ordinary differential equations as limits of pure jump Markov process. J Appl Prob. (1970) 7:49–58.
32. Kurtz TG. Limit theorems for sequences of jump Markov processes approximating ordinary differential processes. J Appl Prob. (1971) 8:344–56.
33. Van Kampen N. Stochastic Processes in Physics and Chemistry, 3rd Edn. Amsterdam: North Holland (2007).
34. Haseltine EL, Rawlings JB. Approximate simulation of coupled fast and slow reactions for stochastic chemical kinetics. J Chem Phys. (2002) 117:6959–69. doi: 10.1063/1.1505860
35. Radulescu O, Muller A, Crudu A. Théorèmes limites pour des processus de Markov à sauts. Synthèse des resultats et applications en biologie moleculaire. Tech Sci Inform. (2007) 26:443–69. doi: 10.3166/tsi.26.443-469
36. Larson DR, Singer RH, Zenklusen D. A single molecule view of gene expression. Trends Cell Biol. (2009) 19:630–7. doi: 10.1016/j.tcb.2009.08.008
37. Gikhman II, Skorokhod AV. Introduction to the Theory of Random Processes. Philadelphia, PA: W.B.Saunders Co. (1969).
38. Gillespie DT. The chemical Langevin equation. J Chem Phys. (2000) 113:297–306. doi: 10.1063/1.481811
40. Gebauer R, Car R. Kinetic theory of quantum transport at the nanoscale. Phys Rev B (2004) 70:125324. doi: 10.1103/PhysRevB.70.125324
41. Breuer HP, Petruccione F. Stochastic dynamics of quantum jumps. Phys Rev E (1995) 52:428. doi: 10.1103/PhysRevE.52.428
43. Kac M. A stochastic model related to the telegrapher's equation. Rocky Mountain J Math. (1974) 4:497–509.
44. Stoer J, Bulirsch R. Introduction to Numerical Analysis, Vol. 12. New York, NY: Springer Science & Business Media (2013).
45. Shahrezaei V, Swain PS. Analytical distributions for stochastic gene expression. Proc Natl Acad Sci USA. (2008) 105:17256–61. doi: 10.1073/pnas.0803850105
46. Innocentini GdCP, Forger M, Ramos AF, Radulescu O, Hornos JEM. Multimodality and flexibility of stochastic gene expression. Bull Math Biol. (2013) 75:2600–30. doi: 10.1007/s11538-013-9909-3
47. Innocentini GC, Forger M, Radulescu O, Antoneli F. Protein synthesis driven by dynamical stochastic transcription. Bull Math Biol. (2016) 78:110–31. doi: 10.1007/s11538-015-0131-3
48. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, etal. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7:S7. doi: 10.1186/1471-2105-7-S1-S7
49. Munsky B, Neuert G, Van Oudenaarden A. Using gene expression noise to understand gene regulation. Science (2012) 336:183–7. doi: 10.1126/science.1216379
50. Shapiro E, Biezuner T, Linnarsson S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat Rev Genet. (2013) 14:618–30. doi: 10.1038/nrg3542
Keywords: gene networks, stochastic gene expression, piecewise-deterministic processes, Liouville-master equation, push-forward method
Citation: Innocentini GCP, Hodgkinson A and Radulescu O (2018) Time Dependent Stochastic mRNA and Protein Synthesis in Piecewise-Deterministic Models of Gene Networks. Front. Phys. 6:46. doi: 10.3389/fphy.2018.00046
Received: 01 March 2018; Accepted: 30 April 2018;
Published: 08 June 2018.
Edited by:
Luis Diambra, National University of La Plata, ArgentinaReviewed by:
Malbor Asllani, University of Namur, BelgiumMatteo Osella, Università degli Studi di Torino, Italy
Copyright © 2018 Innocentini, Hodgkinson and Radulescu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Ovidiu Radulescu, b3ZpZGl1LnJhZHVsZXNjdUB1bW9udHBlbGxpZXIuZnI=