- Department of Systems Biology, Harvard Medical School, Boston, MA, United States
The linear framework uses finite, directed graphs with labelled edges to model biomolecular systems. Graph vertices represent chemical species or molecular states, edges represent reactions or transitions and edge labels represent rates that also describe how the system is interacting with its environment. The present paper is a sequel to a recent review of the framework that focussed on how graph-theoretic methods give insight into steady states as rational algebraic functions of the edge labels. Here, we focus on the transient regime for systems that correspond to continuous-time Markov processes. In this case, the graph specifies the infinitesimal generator of the process. We show how the moments of the first-passage time distribution, and related quantities, such as splitting probabilities and conditional first-passage times, can also be expressed as rational algebraic functions of the labels. This capability is timely, as new experimental methods are finally giving access to the transient dynamic regime and revealing the computations and information processing that occur before a steady state is reached. We illustrate the concepts, methods and formulas through examples and show how the results may be used to illuminate previous findings in the literature.
1 Introduction
The linear framework is a graph-theoretic approach to analysing biomolecular systems (Gunawardena, 2012; Mirzaev and Gunawardena, 2013; Gunawardena, 2014). A recent review (Nam et al., 2022) described how the framework has been used to study systems at steady state, in contexts such as post-translational modification and gene regulation. The present paper is a sequel to this review, which describes how the graph-theoretic approach can be extended to the transient regime, prior to the steady state being reached, for systems that are Markov processes. These new results were introduced in the first author’s Ph.D. thesis (Nam, 2021) and full details with complete proofs are being published separately (Nam and Gunawardena, 2023). The purpose of the present paper is to provide an elementary introduction to this circle of ideas for a wider readership in cell and developmental biology. We hope this will be of interest to anyone who wants to explore the transient regime for biological systems that can be modelled by Markov processes.
Linear framework graphs (hereafter, “graphs”) are finite, simple, directed graphs with labelled edges. (A simple graph is one in which there is at most one edge between any two distinct vertices and there are no self-loops.) Graph vertices, usually denoted 1, 2, 3, …, represent chemical species or molecular states; edges, denoted i → j, represent reactions or transitions; and edge labels, denoted ℓ(i → j), represent rates which are positive and have dimensions of (time)−1. Importantly, the labels may include expressions that describe how the underlying system is interacting with its environment. For example, the graph in Figure 1A shows how ligand binding gives rise to concentration terms in the edge labels.
FIGURE 1. Linear framework graph and Laplacian matrix. (A) An example graph, G, representing the binding of two ligands, each to one site, on a biomolecule, with vertices indexed 1, …, 4 as shown. The labels on the edges 1 → 2 and 3 → 4 include the concentration, x, of the blue ligand that binds to the first site and the labels on the edges 1 → 3 and 2 → 4 include the concentration, y, of the purple ligand that binds to the second site. The parameters k1, k3, k5 and k7 are on-rates for binding, with dimensions of (concentration × time)−1; the other parameters are simple rates with dimensions of (time)−1. Graphics were generated using BioRender.com. (B) The Laplacian matrix,
A graph yields a linear dynamics, from which the linear framework gets its name. The dynamics is most simply described by imagining that the edges are chemical reactions with the edge labels as the rate constants for mass-action kinetics. Since each reaction has only a single substrate, the resulting dynamics is necessarily linear and can be expressed in matrix form as
Here,
Eq. 2 manifests itself in the column sums of the Laplacian being zero,
The framework is typically used in two contexts: for bulk biochemistry of reacting chemical species, where u(t) in Eq. 1 describes the deterministic time evolution of species concentrations; and for individual molecular systems that exhibit stochastic transitions, where u(t) describes the deterministic time evolution of the probabilities of the molecular states. In the latter case, since probabilities sum to 1, utot = 1. It is interesting that the same mathematics describes both contexts. Here, we will be working in the context of individual molecules and stochastic transitions. From now on, u(t) will be the vector of probabilities and we will assume that utot = 1.
The graph formulation allows nonlinear biochemistry, which often arises from ligand binding, to be disentangled into a linear part carried by the linear dynamics in Eq. 1 and a nonlinear part that comes through the edge labels (Nam et al., 2022). The terms appearing in the labels, such as ligand concentrations (Figure 1A), have to be dealt with separately. They may be specified by separate conservation laws or by other graphs (Nam et al., 2022). For the present paper, we will assume that any ligands that are interacting with a graph are present in “reservoirs” (Nam et al., 2022, §4), similar to thermodynamic reservoirs, so that their free concentrations do not change upon binding. Accordingly, edge labels are treated as constants over the timescale of the dynamics in Eq. 1. In this case, for the stochastic context described above, the graph specifies the infinitesimal generator for a finite-state, continuous-time, time-homogeneous Markov process, X(t), (hereafter, a “Markov process”), so that the edge labels are given by,
whenever the right-hand side is nonzero and therefore positive. (A zero infinitesimal rate does not yield an edge.) Conversely, any such Markov process with an infinitesimal generator is specified by a graph (Mirzaev and Gunawardena, 2013, Theorem 4). The Laplacian dynamics in Eq. 1, with utot = 1, becomes the master equation for the forward evolution of the vertex probabilities, u(t). The linearity of the linear framework is perhaps less surprising now, as master equations are, indeed, linear (van Kampen, 1992). We see that, within reservoir assumptions, the linear framework provides a graph-theoretic way to define and study the Markov processes that have been widely used to model biological systems.
Surprisingly, the graph rarely makes an appearance in the Markov process literature. This may be because the graph theory has so far primarily been used to study steady states of the Laplacian dynamics (Nam et al., 2022), which may not have been of much mathematical interest outside of applications in biology. Since Eq. 1 is linear, it can readily be solved in terms of the eigenvalues and eigenvectors of
When G is strongly connected (see below), the steady state, u∞(G) is unique. This particular eigenvector can be calculated from
The algebra that gives rise to manifestly positive polynomials is controlled by appropriate subgraphs of G, described in the classical Matrix-Tree theorem (MTT), which goes back to 19th century work on electrical circuits (Kirchhoff, 1847; Mirzaev and Gunawardena, 2013); the manifest positivity is exactly what is required for parametric dependence in biology. Steady-state probabilities thereby emerge as manifestly positive rational functions of the edge labels (Eq. 4). This representation has proved very useful in giving mathematical access to steady states (Nam et al., 2022).
An important feature of this rational expression for steady-state probabilities is that it holds for systems that do not necessarily reach a steady state of thermodynamic equilibrium. Briefly, graphs that can reach thermodynamic equilibrium must be reversible, so that, given any edge i → j, there is an edge j → i that represents the reverse process, and must satisfy the cycle condition: the product of the label ratios along any cycle of reversible edges is always 1 (Nam et al., 2022, §4). The cycle condition is equivalent to detailed balance or microscopic reversibility. In this case, a considerable simplification can be made in describing steady-state probabilities and the resulting expressions turn out to be equivalent to those of equilibrium statistical mechanics (Nam et al., 2022, §4). One great advantage of the linear framework is that it provides a restricted context in which non-equilibrium statistical mechanics can be exactly solved in rational algebraic terms. The functional significance of energy expenditure is a very interesting problem in cellular information processing (Estrada et al., 2016) but lies outside the scope of the present paper. We will mention some of the questions that arise in the Discussion.
A distinguishing feature of the linear framework is that the graph is treated, not just as a description or as a vehicle for doing Matrix-Tree calculations, but as a mathematical entity in its own right, in terms of which general theorems can be formulated. The graph provides a rigorous language in which salient biological features can be precisely expressed while others can be left largely unspecified, thereby allowing some general principles to emerge from behind the overwhelming molecular complexity that is ever present. Among the areas for which this approach has yielded insights are input-output responses (Wong et al., 2018; Yordanov and Stelling, 2018), post-translational modifications (Dasgupta et al., 2014; Nam et al., 2020), allostery (Biddle et al., 2021) and gene regulation (Estrada et al., 2016; Biddle et al., 2019).
Since the initial development of the linear framework, we had long thought that only steady states could be expressed as rational functions of the edge labels. However, as we will show here, important properties of the transient regime, such as first-passage times, can also be calculated as rational functions of the edge labels. The capability to analyse transient behaviour using graph-theoretic methods is particularly welcome because real-time and single-molecule experimental methods are finally giving access to the transient regime within living cells (Kleine Borgmann et al., 2013; Liao et al., 2015; Jones et al., 2017; Loffreda et al., 2017; Chen et al., 2018; Dufourt et al., 2018; Mir et al., 2018; Volkov et al., 2018; Nandan et al., 2022). Much of our understanding of biochemical behaviour has relied on steady-state assumptions, which are not always explicitly stated. The rich complexity of transient behaviours which are beginning to emerge suggests that the time is ripe to develop a more fundamental understanding of the kinds of biochemical computations and information processing that can be achieved transiently. For this, the mathematical methods described here may be of some value.
2 Results
2.1 Steady states and spanning trees
As preparation for discussing first-passage times, we briefly explain how steady-state probabilities are calculated in terms of the graph; see (Nam et al., 2022, §2) for more details. If we have a graph G, we noted in the Introduction that the steady state, u∞(G), satisfies
(The structure of
The classical Matrix-Tree theorem (MTT) yields a formula for a canonical basis vector,
We need some terminology to explain how ρ(G) is determined from G. A spanning forest, F, of G is a subgraph that contains all vertices in G (“spanning”), lacks cycles when edge directions are ignored (“forest”), and has at most one outgoing edge from each vertex. The vertices with no outgoing edges are called the roots of F. If F has only one root, it is called a spanning tree. A forest consists of separate trees, although the forest is upside down, with each tree ascending to its root. Given any non-empty subset of vertices, ∅ ≠ U ⊆{1, …, N}, let ΦU(G) denote the set of spanning forests of G that are rooted at U. Finally, given any subgraph H of G, let w(H) denote the product of all the edge labels in H: w(H) = ∏i→j∈Hℓ(i → j). As a matter of convention, if H has no edges, then w(H) = 1. Then, ρi(G) is obtained by summing w(F) over all spanning trees F of G that are rooted at i,
ρi(G) is a manifestly positive polynomial in the edge labels, with each w(F) being a monomial with coefficient +1. The steady-state probabilities, u∞(G), can be recovered from ρi(G) by using Eq. 4. Figure 2 illustrates this calculation for an example graph with five vertices and i = 5. Spanning trees are sufficient to calculate steady-state probabilities in Eq. 5 but spanning forests are also needed for the transient quantities considered below (Eqs. 6, 7).
FIGURE 2. Spanning trees and steady-state probabilities. (A) An example graph, G, on five vertices,
Eq. 5 is a consequence of the classical MTT. The MTT is one of a family of theorems that describe the relationship between the minors of
Since a strongly connected graph contains at least one directed path from each vertex to every other vertex, there is always at least one spanning tree rooted at each vertex. Therefore, the right-hand side of Eq. 5 is never empty and has at least one term for any choice of i. However, the number of rooted spanning trees may depend on the vertex: in Figure 2, there are 20 spanning trees rooted at vertex 5 but the reader can check that there is only one spanning tree rooted at vertex 3. The size of ρi(G) can vary markedly with i, depending on the structure of G.
It follows from Eq. 4 that u∞(G) is a manifestly positive rational function of the labels and is also always nonzero, irrespective of the values of the labels. It is well known in probability theory that the steady-state probabilities of a Markov process are always positive when the corresponding graph is strongly connected, and here we not only see why this is so but also how to calculate these probabilities in terms of the transition rates.
Manifest positivity is what we would want for a formula that yields a steady-state probability. It is a striking fact that many well-known mathematical formulas of molecular biology, such as those of Michaelis–Menten and King–Altman in enzyme kinetics, Monod–Wyman–Changeux and Koshland–Némethy–Filmer in protein allostery and Ackers–Johnson–Shea in gene regulation, all have the structure of manifestly positive rational functions. However, they are typically derived in entirely different ways. In fact, all these rational functions can be shown to arise from Eqs. 4, 5 applied to appropriate linear framework graphs (Gunawardena, 2012; Wong et al., 2018; Nam et al., 2022), thereby revealing a surprising mathematical unity underlying the complexity of molecular biology.
2.2 First-passage times and spanning forests
We turn now from the steady state to the transient regime and specifically to first-passage times (FPTs) (Iyer-Biswas and Zilman, 2016). Given a graph G, the FPT from one vertex, i, to a distinct target vertex, j ≠ i, is the random variable for the time it takes the underlying Markov process, X(t), to reach j for the first time when starting from i. Formally,
Of interest are the mean and higher moments of the FPT distribution. Recurrence times for the process returning to i after leaving i can be treated similarly, as can FPTs for reaching a subset of target states from a distinct subset of initial states, but we will leave these refinements aside so as not to complicate the discussion.
For the kinds of stochastic molecular systems considered here, FPTs have been used to quantify several properties: the completion time of an enzymatic turnover (Fisher and Kolomeisky, 1999; Kou et al., 2005; Shaevitz et al., 2005; Kolomeisky and Fisher, 2007; Chemla et al., 2008; Garai et al., 2009; Bel et al., 2010; Moffitt et al., 2010; Cao, 2011; Moffitt and Bustamante, 2014); the speed with which an enzyme can discriminate between correct and incorrect substrates (Banerjee et al., 2017; Cui and Mehta, 2018; Mallory et al., 2019); the statistical structure of transcriptional bursting (Lammers et al., 2020); and the time by which a regulated molecule crosses an abundance threshold (Co et al., 2017; Ghusinga et al., 2017; Gupta et al., 2018). We briefly discuss two examples by way of motivation before proceeding to the technical details.
The development of single-molecule techniques for visualising transcription in live cells (Fukaya et al., 2016; Dufourt et al., 2018) has revealed that transcription is often characterised by transient “bursts” of mRNA expression interspersed by periods of inactivity. Efforts to explain how such bursting arises have focussed on stochastic transitions between transcriptionally active and inactive states in a Markovian setting (Peccoud and Ycart, 1995; Lammers et al., 2020). In active states, successive mRNAs are produced in a burst, which is terminated when the system makes a transition to an inactive state. The FPT to reach an active state from an inactive one provides an estimate of the time between bursts, which can be measured experimentally. As noted by Lammers et al. (2020), comparing the distributions of such FPTs offers a sensitive means to discriminate between different gene regulatory models.
FPTs have also been used to quantify the time at which a regulated molecule reaches a specific abundance threshold (Co et al., 2017; Ghusinga et al., 2017; Gupta et al., 2018). An example of this type of system is bacterial lysis by phage λ. Upon infecting Escherichia coli, phage λ expresses a protein, holin S105, that accumulates in the inner cell membrane until a threshold concentration is reached, at which point the holin molecules abruptly initiate lysis by puncturing the membrane with large irregular holes (White et al., 2010). Various other cellular processes, such as bacterial sporulation (Piggot and Hilbert, 2004), cell cycle progression (Liu et al., 2015) and cell migration during development (Gupta et al., 2018), rely on similar thresholding mechanisms. The FPT analysis undertaken by Ghusinga et al. (2017) shows the impact of different regulatory strategies on the variance in the FPT to reach the threshold and gives insight into the regulatory mechanism of bacterial lysis.
Despite their broad usefulness in biology, FPTs have often been calculated by numerical simulations (Lammers et al., 2020) or by analytical methods that rely on the special structure of the model (Ghusinga et al., 2017). We describe here a systematic graph-theoretic scheme, similar to that in Eq. 5, by which the moments of the FPT distribution can be expressed as rational functions of the edge labels.
Since Θi,j(G) measures the time taken by X(t) to reach j from i for the first time, the distribution of Θi,j(G) does not depend on the outgoing edges from j or their labels. Therefore, one can remove from G the edges leaving j without affecting the distribution of Θi,j(G). For example, the distribution of Θi,5(G) is the same for the strongly connected graph in Figure 2A and for the graph in Figure 3A, which is formed by removing the edges leaving 5 from the graph in Figure 2A. In consequence, it is convenient when working with FPTs to deal with graphs that may not be strongly connected, for which some additional terminology is helpful.
FIGURE 3. Spanning forests and FPTs. (A) An example graph, G, obtained by taking the graph in Figure 2A and removing the outgoing edge from vertex 5. G has a single terminal SCC containing the single vertex 5. (B) The 24 doubly-rooted spanning forests of G in which 5 is a root (red font) and there is a path from 1 to the other root (also in red font), each with its corresponding product of edge labels. The sum of these 24 edge label products is equal to the numerator of
A graph G always has a unique decomposition into strongly connected components (SCCs), which can be thought of as the maximal strongly connected subgraphs; see Mirzaev and Gunawardena (2013) for the full details. The directed edges which leave these SCCs give rise to a partial order on the set of SCCs. Those SCCs which are maximal in the partial order are called terminal. For example, the graph in Figure 2A is strongly connected and therefore has only a single SCC, but if the edge 5 → 1 is removed, to yield the graph in Figure 3A, this graph has 3 SCCs in the partial order {1, 2, 3}⪯{4}⪯{5}. Let us consider the special case where G has a unique terminal SCC that contains just one vertex, say, q ∈ {1, …, N}, like the graph in Figure 3A. This is what happens upon removal of the edges leaving a vertex, q, in a strongly connected graph, as in Figure 2A: q forms a unique terminal SCC, {q}, with only one vertex. If the underlying Markov process X(t) starts from any other vertex, say i, then the probability that X(t) eventually reaches q is 1. There may, of course, be trajectories of the process along which q is never reached but these form a set of probability zero.
We need just a bit more notation. The quantities we want to calculate are the kth moments of the probability distribution of the FPT from i to q,
where ⟨ − ⟩ denotes the average over the underlying sample space of trajectories. Let
The numerator in Eq. 6 runs over all doubly-rooted spanning forests of G in which q is one root and there is a directed path of edges from i to the other root. Figure 3B demonstrates this calculation for the graph in Figure 3A. The denominator in Eq. 6 runs over all spanning trees of G rooted at q and is similar in that respect to the right-hand side of Eq. 5.
The combinatorics become more complicated for the higher moments of Θi,q(G). Choose k-tuples of non-terminal vertices,
and set j0 = i. Then, for the kth moment, we have (Nam and Gunawardena, 2023),
The product in the numerator of Eq. 7 again involves doubly-rooted spanning forests, in which q is one of the roots and the other root shifts along the k-tuple from j1 to jk, with ju−1 having a directed path to ju as u runs from 1 to k. Eq. 7 reduces to Eq. 6 when k = 1.
Note that a spanning forest, or the special case of a spanning tree, that has q as a root cannot include any outgoing edge from q. Hence, the spanning forests or trees with q = 5 as a root are the same for the strongly connected graph in Figure 2A as for the graph in Figure 3A, in which {q} has become the unique terminal SCC by removing the edges that leave q. Accordingly, both the numerator and denominator in Eqs. 6, 7 give the same result for q = 5 in either graph. This is the graph-theoretic consequence of the fact, mentioned above, that the probability distribution of Θi,5(G) is the same for the graphs in Figure 2A and Figure 3A.
Eq. 7 and, by specialisation, Eq. 6 can be derived, after some manipulations, from the All-Minors Matrix-Tree theorem, a more recent generalisation of the classical MTT (Nam and Gunawardena, 2023).
As a sanity check on Eq. 7, we note that if G has N vertices, then any spanning forest with r roots has N − r edges, as can be checked for the examples in Figure 2B and Figure 3B. It follows from Eq. 7 that
Let us see what Eq. 7 tells us for the graph G consisting of just two vertices, 1 and 2, with ℓ(1 → 2) = a and ℓ(2 → 1) = b. If we consider
In particular, the mean FPT is 1/a and the variance, which is
Eq. 7 gives a general and systematic method to calculate FPTs from the linear framework graph associated with a Markov process. It can be used to calculate exact formulas in simple graphs and to avoid estimating FPT moments by cumbersome numerical simulations of the Markov process. The combinatorics rapidly become formidable as the graph becomes larger or less symmetric, as is perhaps already evident in Figure 2B and Figure 3B. The broader value of Eq. 7 is that it reveals the mathematical structure of the FPT moments as manifestly positive rational functions of the edge labels. This can often be informative in its own right, as we will see in discussing enzyme kinetics below. We will say more about ways of dealing with the combinatorial complexity in the Discussion.
2.3 Splitting probabilities and conditional FPTs
In the previous section, we considered the FPT distribution from a given vertex i to a single target vertex. It is, however, often the case that there are several target vertices and one wants to know the probability of reaching a particular target vertex or the FPT to that vertex conditioned on the Markov process actually reaching it. (If target vertices lie in different SCCs that are not related in the partial order, then a trajectory that reaches one target can never reach any other target, so that the mean FPT to each target becomes infinite. Conditioning on reaching the target is therefore essential.) Let us suppose, therefore, that G is a graph with one or more terminal SCCs, each of which consists of a single vertex. Let
The denominator in Eq. 8 runs over all spanning forests of G rooted at
FIGURE 4. Splitting probabilities. (A) An example graph, G, on six vertices,
Let us turn now to the conditional FPT for reaching a particular target vertex,
If there is only one terminal vertex, so that
Evidently, the unconditional mean FPT to reach any terminal vertex in
Combining Eqs. 8, 9, we can show that this mean FPT can also be expressed in terms of the spanning forests of G, as
which specialises to Eq. 6 when there is only a single terminal vertex.
Splitting probabilities and conditional FPTs have not been as widely used as have the unconditional FPTs described in the previous section. This reflects the relatively simple models that have been formulated so far in the literature. However, as we have shown here, there is no greater difficulty in dealing with these more complex quantities, at least within the graph-theoretic approach that we have outlined here. All the quantities we have considered are manifestly positive rational functions of the edge labels. This mathematical accessibility should allow deeper analysis of transient stochastic properties.
2.4 Single-molecule enzyme kinetics
Single-molecule experimental methods have given unprecedented access to the stochastic kinetics of individual enzymes and have stimulated the development of theoretical models to account for the resulting data. This literature offers a convenient setting to illustrate the ideas introduced above.
A frequently used model in enzyme kinetics corresponds to a pipeline graph (Figure 5) (Fisher and Kolomeisky, 1999; Kou et al., 2005; Kolomeisky and Fisher, 2007; Chemla et al., 2008; Garai et al., 2009; Moffitt et al., 2010; Moffitt and Bustamante, 2014). Such a graph consists of vertices 1, …, N, representing different conformations of the enzyme, with nearest-neighbour transitions, i → i + 1 or i → i − 1. Substrate may bind at any forward transition, i → i + 1, so that ℓ(i → i + 1) incurs a concentration term that we will denote by x, and binding is assumed to be reversible, so that i + 1 → i. The final transition, N − 1 → N, is usually treated as an irreversible catalytic step, with the enzyme returning to its initial conformation, so that vertex N corresponds to vertex 1 in the next enzymatic cycle. A pipeline may be thought of as partitioned into reversible “blocks” that are separated by sequences of irreversible transitions. Figures 5A, C show pipeline graphs with 1 and 3 reversible blocks, respectively.
FIGURE 5. Pipeline graphs. (A) A pipeline graph on 8 vertices that consists of a single reversible block, with substrate binding with concentration x at the edges 2 → 3 and 4 → 5, followed by a single irreversible transition, 7 → 8. (B) The spanning forest F(2, 6, 8), in the notation described in the text, for the graph in panel A. The two roots, 2 and 8, are in red font. (C) A pipeline graph with three reversible blocks, in each of which the substrate binds once. As explained in the text, the mean FPT,
The mean FPT for reaching vertex N from vertex 1 is a measure of the enzyme’s completion time. Bustamante and colleagues have emphasised how the substrate dependence of
Consider first a pipeline graph, G, with a single reversible block consisting of the vertices 1, …, N − 1 and recall Eq. 6 for the mean FPT, where the terminal vertex is q = N. An example is shown in Figure 5A with the notation that we will use for the edge labels, ℓ(i → i + 1) = pi and ℓ(i + 1 → i) = qi. It is evident that there is only a single spanning tree, T ∈ Φ{N}(G), consisting of all the forward edges, so that w(T) = p1⋯pN−1. This gives the denominator of
where the “missing” label, between vertices k − 1 and k, corresponds to the gap between the tree rooted at j and the tree rooted at N in the forest. If we divide by the denominator, we see that each spanning forest F(j, k, N) contributes a rational function of the labels that we may write in the form,
The spanning forests in Φ{j,N}(G) therefore contribute the sum,
where,
Note that, in Eq. 12, the empty product for k = j + 1 is by convention taken to be 1. It follows from Eq. 6 that the enzyme completion time is given by,
With some notational translation, Eq. 13 can be seen to be the same as (Moffitt et al., 2010, Eq. S2). The quantity Δ(j, N) in Eq. 12 first appears in Derrida’s derivation of the velocity and diffusion constant of a Markov particle on a periodic pipeline (Derrida, 1983, Eq. 24); Δ(j, N) = Γ(j + 1, N − 1), where Γ is the quantity defined in Eq. S3 of Moffitt et al. (2010). The calculation above, using the general formula for the mean FPT in Eq. 6, is hopefully more transparent.
Suppose now that substrate binds at s forward transitions in the pipeline graph, with concentration x. We will refer to terms other than x in the edge labels as “kinetic parameters,” which thereby include both simple rates and on-rates. Since we can exclude the final catalytic transition from substrate binding, it follows that 1 ≤ s ≤ N − 2. Eq. 11 then shows that the enzyme completion time has the following structure as a rational algebraic function of x,
Here, the coefficients a0, …, as and b are all manifestly positive polynomials in the kinetic parameters. In particular, the forest F(N − 1, N, N) includes all the substrate-binding transitions, which confirms that as > 0. If the substrate-binding transitions are specified, these polynomials may be explicitly calculated using Eq. 11. Eq. 14 already provides some insight. In the limit of low substrate, the completion time diverges at an order, 1/xs, that depends on the number of substrate-binding transitions. In contrast, in the limit of high substrate, the completion time asymptotes to the positive value as/b. If substrate binds at only one transition in the pipeline, so that s = 1, then the completion time exhibits a reciprocal Michaelis–Menten form (Kou et al., 2005; Garai et al., 2009; Moffitt et al., 2010; Moffitt and Bustamante, 2014) (Discussion),
The higher moments of the FPT, as specified by Eq. 7, are more complicated to calculate but the doubly-rooted spanning forests that are needed for the numerator, which are contained in
In their study of the packaging motor for the φ29 bacteriophage, Bustamante and colleagues consider a more general pipeline graph, G, that consists of multiple reversible blocks separated by single irreversible transitions (Figure 5C) (Moffitt et al., 2010). The packaging motor is a pentameric ring of identical ATPase units that compacts the φ29 double-stranded DNA into the assembling viral capsid. It has been found to do this in a burst of four ATP-consuming steps per cycle. ATP hydrolysis during the catalytic step is typically irreversible under physiological conditions and a pipeline with 4 reversible blocks serves as a model for the motor (Moffitt et al., 2010, Figure 4A).
If the Markov process takes an irreversible transition in G, it cannot subsequently visit the preceding reversible blocks. Also, every irreversible transition must be taken to reach N. Hence, any trajectory that begins at 1 and reaches N must take each irreversible transition exactly once. It follows from this that the FPT from 1 to N is just the sum of the FPTs for each reversible block considered separately and these FPTs are all independent of each other. Suppose there are m reversible blocks which start at the vertices e0, e1, …, em−1, where 1 = e0 < e1 < e2 < ⋯ < em−1 < N. Let Gi be the subgraph consisting of the vertices from ei−1 to ei, which includes the ith reversible block and the immediately following irreversible transition. It follows that,
If substrate binds at the same number of transitions in each reversible block, then Eq. 7 shows that the
which is readily seen from the discussion above to be a quadratic rational function of x, and they also fit this curve to the experimental data (Moffitt et al., 2010, Figure 3B). A theorem due to Aldous and Shepp (1987), which is of independent interest, tells us that, for an arbitrary graph with N vertices, nmin < N.
An interesting question arises as to whether nmin itself is also manifestly positive, as might be expected of a coefficient of variation, given that this is true for both
3 Discussion
We have reviewed here how the graph-theoretic linear framework, as applied to continuous-time Markov processes, can be used to show that the moments of the FPT distribution (Eqs. 6, 7), splitting probabilities (Eq. 8) and conditional mean FPTs (Eq. 9) can be exactly expressed as manifestly positive rational algebraic functions of the edge labels or transition rates. This reveals that not only steady-state probabilities but also transient properties of Markov processes have this same algebraic structure, thereby substantially expanding the mathematical scope of the linear framework.
The formulas given here can be used to obtain closed-form solutions for simple graphs, as we showed for the pipeline graphs used in enzyme kinetics (Eq. 13). However, this is a little misleading because enumeration of spanning forests becomes rapidly intractable as the graph becomes larger or less symmetric. Moreover, as is evident by examining the algebraic terms in Figure 2B and Figure 3B, every label in the graph can appear in the formulas. There is both a combinatorial explosion and a global parametric dependence. These challenges have long been recognised when dealing with steady-state probabilities (Nam et al., 2022), before the transient regime became mathematically accessible, and several strategies have emerged for dealing with them.
First, when properties of interest are treated as functions of substrate concentration, a great deal can be said about the resulting rational algebraic structure, even when it is hard to calculate the coefficients explicitly in terms of the edge labels (Thomson and Gunawardena, 2009; Nam et al., 2022). As we saw with Eq. 14, the algebraic structure for the mean FPT,
Second, the question of when the Michaelis–Menten structure arises is closely related to whether or not the graph satisfies the cycle condition and can thereby reach a steady state of thermodynamic equilibrium. If it can, there is a necessary and sufficient condition for the emergence of the Michaelis–Menten structure; if it cannot, and the graph reaches a non-equilibrium steady state, then only partial sufficient conditions are known (Wong et al., 2018). Of course, the pipeline example just mentioned cannot reach thermodynamic equilibrium, as it contains irreversible transitions (Figure 5A). If the cycle condition is satisfied, the complexity problem is substantially reduced, insofar as calculating steady-state probabilities is concerned. It is possible to find an alternative basis element to ρ(G) in
Aside from the calculational complexity, the thermodynamic issues also have a deep impact on biological function. The role of energy expenditure in force generation or pattern formation has been widely studied (Kolomeisky and Fisher, 2007; Karsenti, 2008) but its significance for cellular information processing has been more elusive (Wong and Gunawardena, 2020). In the latter domain, unlike the two former ones, information processing can take place at thermodynamic equilibrium, for instance, through binding and unbinding. However, there is a limit to how well this can be done, as first pointed out by Hopfield (1974). We have introduced the concept of the Hopfield barrier, as the limit to how well a given information processing task can be undertaken by a mechanism that operates at thermodynamic equilibrium (Estrada et al., 2016). For example, the Hill function with Hill coefficient n is the universal Hopfield barrier for the sharpness of input-output responses with n binding sites for the input (Nam et al., 2022; Martinez-Corral et al., 2023). Another interesting question arises as to whether there are also Hopfield barriers in the transient regime. That is, if a graph satisfies the cycle condition and can reach a steady state of thermodynamic equilibrium, are there limits on the moments of the FPT distribution,
Third, the algebraic complexity of non-equilibrium steady states can be reorganised to make the complexity more tractable (Çetiner and Gunawardena, 2022). This breakthrough has enabled steady-state calculations to be undertaken that were previously out of reach. It is conceivable that similar kinds of reorganisation may also throw light on the calculation of transient quantities. Finally, a fourth potential approach to overcoming the complexity is to exploit the recursive technique for enumerating spanning forests that was developed by Chebotarev and Agaev (2002). While this technique looks promising, it has yet to be properly exploited.
The methods outlined here bring the FPTs of Markov processes into focus as manifestly positive rational algebraic functions of the transition rates. This gives mathematical access to them in a way that has been lacking in previous treatments, which have not exploited graph theory and the Matrix-Tree theorems. We hope this review will encourage more use of the linear framework in cell and developmental biology. We anticipate that, as we have found for steady states, this exploration will lead to further general principles and mathematical theorems that rise above the molecular complexity that confronts us in biology.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.
Author contributions
K-MN undertook most of the work described here in his Ph.D. thesis, which was supervised by JG. K-MN and JG wrote the paper together. All authors contributed to the article and approved the submitted version.
Funding
K-MN and JG were supported in part by NIH grant R01GM122928.
Acknowledgments
We thank Michael Blinov for the invitation to submit a paper to this research topic and for his encouragement and patience; two reviewers for their constructive suggestions; and members of the Gunawardena lab for their comments.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Aldous, D., and Shepp, L. (1987). The least variable phase-type distribution is Erlang. Commun. Stat. Stoch. Models 3, 467–473. doi:10.1080/15326348708807067
Banerjee, K., Kolomeisky, A. B., and Igoshin, O. A. (2017). Elucidating interplay of speed and accuracy in biological error correction. Proc. Natl. Acad. Sci. U. S. A. 114, 5183–5188. doi:10.1073/pnas.1614838114
Bel, G., Munsky, B., and Nemenman, I. (2010). The simplicity of completion time distributions for common complex biochemical processes. Phys. Biol. 7, 016003. doi:10.1088/1478-3975/7/1/016003
Biddle, J. W., Nguyen, M., and Gunawardena, J. (2019). Negative reciprocity, not ordered assembly, underlies the interaction of Sox2 and Oct4 on DNA. eLife 8, e41017. doi:10.7554/eLife.41017
Biddle, J. W., Martinez-Corral, R., Wong, F., and Gunawardena, J. (2021). Allosteric conformational ensembles have unlimited capacity for integrating information. eLife 10, e65498. doi:10.7554/eLife.65498
Cao, J. (2011). Michaelis–Menten equation and detailed balance in enzymatic networks. J. Phys. Chem. B 115, 5493–5498. doi:10.1021/jp110924w
Çetiner, U., and Gunawardena, J. (2022). Reformulating non-equilibrium steady states and generalized hopfield discrimination. Phys. Rev. E 106, 064128. doi:10.1103/PhysRevE.106.064128
Chebotarev, P., and Agaev, R. (2002). Forest matrices around the Laplacian matrix. Lin. Alg. Appl. 356, 253–274. doi:10.1016/s0024-3795(02)00388-9
Chemla, Y. R., Moffitt, J. R., and Bustamante, C. (2008). Exact solutions for kinetic models of macromolecular dynamics. J. Chem. Phys. B 112, 6025–6044. doi:10.1021/jp076153r
Chen, H., Levo, M., Barinov, L., Fujioka, M., Jaynes, J. B., and Gregor, T. (2018). Dynamic interplay between enhancer–promoter topology and gene activity. Nat. Genet. 50, 1296–1303. doi:10.1038/s41588-018-0175-z
Chung, F. R. K. (1997). Spectral graph theory. No. 92 in regional conference series in mathematics. Providence, RI, USA: American Mathematical Society.
Co, A. D., Lagomarsino, M. C., Caselle, M., and Osella, M. (2017). Stochastic timing in gene expression for simple regulatory strategies. Nucleic Acids Res. 45, 1069–1078. doi:10.1093/nar/gkw1235
Cui, W., and Mehta, P. (2018). Identifying feasible operating regimes for early T-cell recognition: the speed, energy, accuracy trade-off in kinetic proofreading and adaptive sorting. PLOS ONE 13, e0202331. doi:10.1371/journal.pone.0202331
Dasgupta, T., Croll, D. H., Owen, J. A., Vander Heiden, M. G., Locasale, J. W., Alon, U., et al. (2014). A fundamental trade-off in covalent switching and its circumvention by enzyme bifunctionality in glucose homeostasis. J. Biol. Chem. 289, 13010–13025. doi:10.1074/jbc.M113.546515
Derrida, B. (1983). Velocity and diffusion constant of a periodic one-dimensional hopping model. J. Stat. Phys. 31, 433–450. doi:10.1007/bf01019492
Dufourt, J., Trullo, A., Hunter, J., Fernandez, C., Lazaro, J., Dejean, M., et al. (2018). Temporal control of gene expression by the pioneer factor Zelda through transient interactions in hubs. Nat. Commun. 9, 5194. doi:10.1038/s41467-018-07613-z
Estrada, J., Wong, F., DePace, A., and Gunawardena, J. (2016). Information integration and energy expenditure in gene regulation. Cell. 166, 234–244. doi:10.1016/j.cell.2016.06.012
Fisher, M. E., and Kolomeisky, A. B. (1999). The force exerted by a molecular motor. Proc. Natl. Acad. Sci. U. S. A. 96, 6597–6602. doi:10.1073/pnas.96.12.6597
Fukaya, T., Lim, B., and Levine, M. (2016). Enhancer control of transcriptional bursting. Cell. 166, 358–368. doi:10.1016/j.cell.2016.05.025
Garai, A., Chowdhury, D., Chowdhury, D., and Ramakrishnan, T. V. (2009). Stochastic kinetics of ribosomes: single motor properties and collective behavior. Phys. Rev. E 80, 011908. doi:10.1103/PhysRevE.80.011908
Ghusinga, K. R., Dennehy, J. J., and Singh, A. (2017). First-passage time approach to controlling noise in the timing of intracellular events. Proc. Natl. Acad. Sci. U. S. A. 114, 693–698. doi:10.1073/pnas.1609012114
Gunawardena, J. (2012). A linear framework for time-scale separation in nonlinear biochemical systems. PLOS ONE 7, e36321. doi:10.1371/journal.pone.0036321
Gunawardena, J. (2014). Time-scale separation: Michaelis and Menten’s old idea, still bearing fruit. FEBS J. 281, 473–488. doi:10.1111/febs.12532
Gupta, S., Varennes, J., Korswagen, H. C., and Mugler, A. (2018). Temporal precision of regulated gene expression. PLOS Comput. Biol. 14, e1006201. doi:10.1371/journal.pcbi.1006201
G. Strang (Editor) (2022). Introduction to linear algebra. 6 edn (Wellesley, MA, USA: Wellesley-Cambridge Press).
Hopfield, J. J. (1974). Kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high specificity. Proc. Natl. Acad. Sci. U. S. A. 71, 4135–4139. doi:10.1073/pnas.71.10.4135
Iyer-Biswas, S., and Zilman, A. (2016). “First-passage processes in cellular biology,” in Advances in chemical physics. Editors S. A. Rice, and A. R. Dinner (John Wiley & Sons Inc.), 261–306.
Jones, D. L., Leroy, P., Unoson, C., Fange, D., Ćurić, V., Lawson, M. J., et al. (2017). Kinetics of dCas9 target search in Escherichia coli. Science 357, 1420–1424. doi:10.1126/science.aah7084
Karsenti, E. (2008). Self-organization in cell biology: a brief history. Nat. Rev. Mol. Cell. Biol. 9, 255–262. doi:10.1038/nrm2357
Kirchhoff, G. (1847). Ueber die Auflösung der Gleichungen, auf welche man bei der Untersuchung der linearen Vertheilung galvanischer Ströme geführt wind. Ann. Phys. Chem. 148, 497–508. doi:10.1002/andp.18471481202
Kleine Borgmann, L. A., Ries, J., Ewers, H., Ulbrich, M. H., and Graumann, P. L. (2013). The bacterial SMC complex displays two distinct modes of interaction with the chromosome. Cell. Rep. 3, 1483–1492. doi:10.1016/j.celrep.2013.04.005
Kolomeisky, A. B., and Fisher, M. E. (2007). Molecular motors: a theorist’s perspective. Annu. Rev. Phys. Chem. 58, 675–695. doi:10.1146/annurev.physchem.58.032806.104532
Kou, S. C., Cherayil, B. J., Min, W., English, B. P., and Xie, X. S. (2005). Single-molecule Michaelis–Menten equations. J. Phys. Chem. B 109, 19068–19081. doi:10.1021/jp051490q
Lammers, N. C., Kim, Y. J., Zhao, J., and Garcia, H. G. (2020). A matter of time: using dynamics and theory to uncover mechanisms of transcriptional bursting. Curr. Opin. Cell. Biol. 67, 147–157. doi:10.1016/j.ceb.2020.08.001
Liao, Y., Schroeder, J. W., Gao, B., Simmons, L. A., and Biteen, J. S. (2015). Single-molecule motions and interactions in live cells reveal target search dynamics in mismatch repair. Proc. Natl. Acad. Sci. U. S. A. 112, E6898–E6906. doi:10.1073/pnas.1507386112
Liu, X., Wang, X., Yang, X., Liu, S., Jiang, L., Qu, Y., et al. (2015). Reliable cell cycle commitment in budding yeast is ensured by signal integration. eLife 4, e03977. doi:10.7554/eLife.03977
Loffreda, A., Jacchetti, E., Antunes, S., Rainone, P., Daniele, T., Morisaki, T., et al. (2017). Live-cell p53 single-molecule binding is modulated by C-terminal acetylation and correlates with transcriptional activity. Nat. Commun. 8, 313. doi:10.1038/s41467-017-00398-7
Mallory, J. D., Kolomeisky, A. B., and Igoshin, O. A. (2019). Trade-offs between error, speed, noise, and energy dissipation in biological processes with proofreading. J. Phys. Chem. B 123, 4718–4725. doi:10.1021/acs.jpcb.9b03757
Martinez-Corral, R., Nam, K.-M., DePace, A. H., and Gunawardena, J. (2023). The Hill function is the universal Hopfield barrier for sharpness of input-output responses. In preparation
Mir, M., Stadler, M. R., Ortiz, S. A., Hannon, C. E., Harrison, M. M., Darzacq, X., et al. (2018). Dynamic multifactor hubs interact transiently with sites of active transcription in Drosophila embryos. eLife 7, e40497. doi:10.7554/eLife.40497
Mirzaev, I., and Gunawardena, J. (2013). Laplacian dynamics on general graphs. Bull. Math. Biol. 75, 2118–2149. doi:10.1007/s11538-013-9884-8
Moffitt, J. R., and Bustamante, C. (2014). Extracting signal from noise: kinetic mechanisms from a Michaelis–Menten-like expression for enzymatic fluctuations. FEBS J. 281, 498–517. doi:10.1111/febs.12545
Moffitt, J. R., Chemla, Y. R., and Bustamante, C. (2010). Mechanistic constraints from the substrate concentration dependence of enzymatic fluctuations. Proc. Natl. Acad. Sci. U. S. A. 107, 15739–15744. doi:10.1073/pnas.1006997107
Nam, K.-M., and Gunawardena, J. (2023). Algebraic formulas for first-passage times of Markov processes in the linear framework. In preparation.
Nam, K.-M., Gyori, B. M., Amethyst, S. V., Bates, D. J., and Gunawardena, J. (2020). Robustness and parameter geography in post-translational modification systems. PLOS Comput. Biol. 16, e1007573. doi:10.1371/journal.pcbi.1007573
Nam, K.-M., Martinez-Corral, R., and Gunawardena, J. (2022). The linear framework: using graph theory to reveal the algebra and thermodynamics of biomolecular systems. Interface Focus 12, 20220013. doi:10.1098/rsfs.2022.0013
Nam, K.-M. (2021). Algebraic approaches to molecular information processing. Ph.D. thesis. Harvard University.
Nandan, A., Das, A., Lott, R., and Koseska, A. (2022). Cells use molecular working memory to navigate in changing chemoattractant fields. eLife 11, e76825. doi:10.7554/eLife.76825
Peccoud, J., and Ycart, B. (1995). Markovian modeling of gene-product synthesis. Theor. Popul. Biol. 48, 222–234. doi:10.1006/tpbi.1995.1027
Piggot, P. J., and Hilbert, D. W. (2004). Sporulation of Bacillus subtilis. Curr. Opin. Microbiol. 7, 579–586. doi:10.1016/j.mib.2004.10.001
Shaevitz, J. W., Block, S. M., and Schnitzer, M. J. (2005). Statistical kinetics of macromolecular dynamics. Biophys. J. 89, P2277–P2285. doi:10.1529/biophysj.105.064295
Thomson, M., and Gunawardena, J. (2009). The rational parameterization theorem for multisite post-translational modification systems. J. Theor. Biol. 261, 626–636. doi:10.1016/j.jtbi.2009.09.003
van Kampen, N. G. (1992). Stochastic processes in physics and chemistry. Amsterdam, The Netherlands: Elsevier.
Volkov, I. L., Lindén, M., Rivera, J. A., Ieong, K.-W., Metelev, M., Elf, J., et al. (2018). tRNA tracking for direct measurements of protein synthesis kinetics in live cells. Nat. Chem. Biol. 14, 618–626. doi:10.1038/s41589-018-0063-y
White, R., Chiba, S., Pang, T., Dewey, J. S., Savva, C. G., Holzenburg, A., et al. (2010). Holin triggering in real time. Proc. Natl. Acad. Sci. U. S. A. 108, 798–803. doi:10.1073/pnas.1011921108
Wong, F., and Gunawardena, J. (2020). Gene regulation in and out of equilibrium. Annu. Rev. Biophys. 49, 199–226. doi:10.1146/annurev-biophys-121219-081542
Wong, F., Dutta, A., Chowdhury, D., and Gunawardena, J. (2018). Structural conditions on complex networks for the Michaelis-Menten input-output response. Proc. Natl. Acad. Sci. U. S. A. 115, 9738–9743. doi:10.1073/pnas.1808053115
Keywords: linear framework, graph theory, Matrix-Tree theorems, rational functions, Markov processes, first-passage times
Citation: Nam K-M and Gunawardena J (2023) The linear framework II: using graph theory to analyse the transient regime of Markov processes. Front. Cell Dev. Biol. 11:1233808. doi: 10.3389/fcell.2023.1233808
Received: 02 June 2023; Accepted: 02 October 2023;
Published: 03 November 2023.
Edited by:
Michael Blinov, UConn Health, United StatesReviewed by:
Silas Boye Nissen, Stanford University, United StatesLee Bardwell, University of California, Irvine, United States
Copyright © 2023 Nam and Gunawardena. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jeremy Gunawardena, jeremy@hms.harvard.edu
†Present Address: Kee-Myoung Nam, Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States