Skip to main content

ORIGINAL RESEARCH article

Front. Cell Dev. Biol., 26 May 2023
Sec. Signaling
This article is part of the Research Topic Insights into Signaling through Quantitative Imaging View all 4 articles

Analysis and design of single-cell experiments to harvest fluctuation information while rejecting measurement noise

Huy D. VoHuy D. Vo1Linda S. Forero-QuinteroLinda S. Forero-Quintero1Luis U. AguileraLuis U. Aguilera1Brian Munsky,
Brian Munsky1,2*
  • 1Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
  • 2School of Biomedical Engineering, Colorado State University, Fort Collins, CO, United States

Introduction: Despite continued technological improvements, measurement errors always reduce or distort the information that any real experiment can provide to quantify cellular dynamics. This problem is particularly serious for cell signaling studies to quantify heterogeneity in single-cell gene regulation, where important RNA and protein copy numbers are themselves subject to the inherently random fluctuations of biochemical reactions. Until now, it has not been clear how measurement noise should be managed in addition to other experiment design variables (e.g., sampling size, measurement times, or perturbation levels) to ensure that collected data will provide useful insights on signaling or gene expression mechanisms of interest.

Methods: We propose a computational framework that takes explicit consideration of measurement errors to analyze single-cell observations, and we derive Fisher Information Matrix (FIM)-based criteria to quantify the information value of distorted experiments.

Results and Discussion: We apply this framework to analyze multiple models in the context of simulated and experimental single-cell data for a reporter gene controlled by an HIV promoter. We show that the proposed approach quantitatively predicts how different types of measurement distortions affect the accuracy and precision of model identification, and we demonstrate that the effects of these distortions can be mitigated through explicit consideration during model inference. We conclude that this reformulation of the FIM could be used effectively to design single-cell experiments to optimally harvest fluctuation information while mitigating the effects of image distortion.

1 Introduction

Heterogeneity in signaling and gene expression at the single-cell level has wide-ranging biological and clinical consequences, from bacterial persistence (Lewis, 2010; Garcia-Bernardo and Dunlop, 2013; 2015; El Meouche et al., 2016) and viral infections (Singh et al., 2010; Weinberger and Weinberger, 2013; Rouzine et al., 2014; Cao et al., 2018) to tumor heterogeneity (Brock et al., 2009; Marusyk et al., 2012). Beside genetic and environmental factors, a significant degree of heterogeneity is caused by biochemical noise (McAdams and Arkin, 1997; Raj and van Oudenaarden, 2008; Balázsi et al., 2011; El-Samad and Weissman, 2011). Therefore, even genetically identical cells grown in the same experimental conditions may display variability in their response to environmental stimuli. This variability, often termed intrinsic noise when it originates within the pathway of interest or extrinsic noise when it originates outside the pathway of interest, obscures the underlying mechanisms when viewed through the lens of deterministic models and bulk measurements (Munsky et al., 2018). Yet, this so-called noise can be highly informative when examined through the lens of single-cell measurements coupled with the mathematical modeling framework of the Chemical Master Equation (CME) (Gillespie, 2007; Anderson and Kurtz, 2011; Stewart-Ornstein and El-Samad, 2012). Through this joint experimental and modeling approach, mechanisms of signaling and single-cell gene expression can be explained, predicted (Munsky et al., 2009; Neuert et al., 2013; Munsky et al., 2018), or even controlled (Milias-Argeitis et al., 2011; 2016; Benzinger and Khammash, 2018; Fox et al., 2022).

In this paper, we are concerned with how and when stochastic models based on the CME could be inferred with high confidence from single-cell experiments such as flow cytometry or optical microscopy (e.g., single-molecule fluorescence in situ hybridization, smFISH), from which data sets are produced in the form of fluorescence intensity histograms (Isaacs et al., 2003; Lim and van Oudenaarden, 2007; Zuleta et al., 2014; Lipinski-Kruszka et al., 2015) or molecular count histograms (Femino et al., 1998; Raj et al., 2006; Li and Neuert, 2019). These measurements, like any other experimental technique, can be corrupted by errors arising from imprecise detection or data processing methods. In flow cytometry, variations in the binding efficiency, binding specificity, or intensity of individual fluorescent reporters or probes, or the existence of background fluorescence, are unavoidable perturbations that may obscure the true copy number of RNA or protein (Munsky et al., 2009; Ruess et al., 2013; Tiberi et al., 2018). While smFISH data sets are generally considered to provide the “gold standard” for measuring transcriptional heterogeneity, the estimation of molecular copy numbers in single cells depends heavily on image segmentation and spot counting algorithms (Raj and Tyagi, 2010; Batish et al., 2011; Pitchiaya et al., 2014; Kesler et al., 2019) that involve several threshold parameters, which are set ad hoc and typically vary from one expert to another. In addition, the sensitivity of smFISH measurements can be heavily affected by the length of the target mRNAs, the number of probes (Shepherd et al., 2013), or the number of hybridization steps (Wheat et al., 2020).

The inescapable fact of measurement noise motivates intriguing questions on the design of single-cell experiments. For instance, under what conditions can detailed statistical measurement noise models and cheap single-cell measurements combine to replace accurate, yet more expensive experimental equipment? How approximate or coarse (and therefore fast) can image processing be while still faithfully retaining knowledge about parameters or mechanisms of interest? These questions are becoming more pressing as new advances in fluorescence tagging and microscopy technology are leading to more sophisticated experimental protocols to produce ever-increasing data on complex signaling and gene expression networks. Addressing these challenges is non-trivial, as it requires careful consideration of the potentially nonlinear combination of measurement noise and the biological question of interest.

The key contribution of this study is to provide new model-driven experiment analysis and design approaches (Figure 1) that include explicit consideration of probabilistic measurement errors in single-cell observations. The process begins with one or more hypotheses written in the form of stochastic gene regulation models (Figure 1A) with uncertain guesses for parameters or mechanisms. Predictions from these biophysical models are then coupled with empirically determined or physically estimated statistical models, known as Probabilistic Distortion Operators (PDOs), that explicitly estimate the effects of different types of measurement errors that could be temporal, discrete, non-symmetric, non-Gaussian, or even the result of their own stochastic process dynamics (Figure 1B). During model inference, the biophysical and measurement distortion models are simultaneously optimized to best explain the data (Figure 1C, right) and to quantify uncertainty in their respective parameters (Figure 1C, left). During experiment design (Figure 1D), sensitivity analysis is used to estimate which combinations of experimental conditions, sampling procedures, or measurement strategies can be expected to provide the most information to constrain the current set of hypotheses, and the procedure is iterated in subsequent rounds of experimentation and model refinement.

FIGURE 1
www.frontiersin.org

FIGURE 1. Proposed approach to analyze and design single-cell experiments under measurement distortion. Box (A): One or more mechanistic model hypotheses are proposed to describe gene regulation processes and are associated with prior parameter guesses. These models are combined with experiment design considerations (e.g., environmental, sampling, and measurement conditions) and the chemical master equation framework is used to predict statistics of single-cell gene expression. Box (B): Empirical data and physical models are used to define a probabilistic distortion operator C to estimate how measurement errors (e.g., labeling inefficiencies, resolution limitations, or image processing errors) affect observations. Box (C): Searches over model and PDO parameter space are conducted to identify model and parameter combinations that maximize the likelihood of the observed (i.e., distorted) data. Box (D) Sensitivity analysis and the Fisher Information Matrix are computed and used to analyze the information content for different mechanistic models, distortion effects, or experiment designs. By estimating parameter uncertainties in the context of different forms of measurement distortion, subsequent experiment designs can help to alleviate measurement noise distortion effects (return to Boxes (A–C)).

To address the specific question of model-driven single-cell experiment evaluation and design (Figure 1D), we adopt the framework of the Fisher Information Matrix (FIM). The FIM is the basis for a large set of tools for optimal experiment design in a myriad of science and engineering fields (Pronzato and Walter, 1985; Chaudhuri and Mykland, 1993; Emery and Nenarokomov, 1998; Ruess and Lygeros, 2013; Chao et al., 2016; Vahid et al., 2019), and it has been employed in the study of identifiability and robustness of deterministic ODEs models in system biology (Gutenkunst et al., 2007; Transtrum et al., 2010) as well as for designing optimal bulk measurement experiments (Faller et al., 2003; Gadkar et al., 2005; Kreutz and Timmer, 2009). As an early application of the FIM to stochastic modeling of gene expression, Komorowski et al. (2011) devised a numerical method to compute the FIM based on the Linear Noise Approximation (LNA) of the CME, which they used to demonstrate the different impacts of time-series, time snapshots, and bulk measurements to parameter uncertainties. There have been subsequent work on approximating the FIM with moment closure techniques (Ruess et al., 2013; Ruess and Lygeros, 2013), with demonstrable effectiveness on designing optimal optogenetic experiments (Ruess et al., 2015). Under an assumption of ideal measurements, Fox and Munsky (Fox and Munsky, 2019) recently extended the Finite State Projection (FSP) algorithm (Fox and Munsky, 2019) to compute a version of the FIM that allows for time-varying and non-linear models that result in discrete, asymmetric, and multi-modal single-cell expression distributions. By extending the FSP-based FIM (Fox and Munsky, 2019) to also account for realistic measurement errors, our new approach can help scientists to decide which combinations of experimental conditions and measurement assays are best suited to reduce parameter uncertainties and differentiate between competing hypotheses.

To verify our proposed approaches, we simulate data for a simple bursting gene expression model under many different types of measurement errors, and we show that the FIM correctly estimates the effects that measurement distortions have on parameter estimation (we explore more complicated models and distortions in the Supplementary Data Sheet S1). To demonstrate the practical use of our approaches, we apply them to analyze single-cell data for the bursting and deactivation of a reporter gene controlled by an HIV promoter construct upon application of triptolide (Trp). We show that the iterative use of FSP to fit distorted experimental data, followed by FIM analysis to design subsequent experiments can lead to the efficient identification of a well-constrained model to explain and predict gene expression.

2 Methods

2.1 Stochastic modeling of gene expression

The expression dynamics of genes or groups of genes in single cells are often modeled by stochastic reaction networks (McQuarrie, 1967; Gillespie, 2007; Anderson and Kurtz, 2011). For these, the time-varying molecular copy numbers in single cells are treated as a Markov jump process {X(t)}t≥0 whose sample paths x(t) = (x1(t), …, xN(t)) take place in discrete multi-dimensional space, where xi(t) is the integer count of species i at time t. Each jump in this process corresponds to the occurrence of one of the reaction events Rk (k = 1, … , M), which brings the cell from the state x (t) right before event time t to a new state x (t+) = x (t) + νk, where νk is the stoichiometry vector corresponding to the kth reaction. The probabilistic rate at which each reaction occurs is characterized by its propensity (or reaction intensity) function, αk (t, x, θ). The vector θ=θ1,,θd is a d-dimensional vector of model parameters. Intuitively, we interpret αk(t, x, θt as the probability for reaction k to fire during the waiting interval [t, t + Δt) for a sufficiently small waiting time Δt. The probability distributions pX (t, θ) of single-cell gene product copy numbers model what is often termed intrinsic noise in gene expression. These distributions are the solution of the chemical master equation (CME)

ddtpXt,θ=At,θpXt,θ,(1)

where A (t, θ) is the infinitesimal generator of the Markov process described above [see Supplementary Data Sheet S1, Section 1 for detailed definition of A (t, θ)]. Extrinsic noise can be modeled by assuming a probabilistic variation for the model parameters, then integrate (1) over that distribution (Ruess et al., 2015). However, we focus on intrinsic noise for the current investigation.

2.1.1 Computing the likelihood of single-cell data

Consider a data set D that consists of Nc independent single-cell measurements (t1,c1),,(tNc,cNc) where ci is the vector of molecule counts of cell i measured at time ti. The likelihood function of the biophysical parameters θ given the data set D is given by

Lθ|D=i=1NcpXti,ci,θ

where pX (t, x, θ) is the probability of observing molecular counts x at time t, obtained by solving the CME (1). Taking the logarithm of both sides, we have the log-likelihood function

logLθ|D=i=1NclogpXti,ci,θ,(2)

which is mathematically and numerically more convenient to work with.

2.1.2 Sensitivity analysis

Taking the partial derivative of both sides of the CME (1) with respect to parameter θ ( = 1, … , d), we get

ddtpXt,θθ=At,θθpXt,θ+At,θpXt,θθ.

when the state space of the Markov process is finite, we can collect the equations above for = 1, … , d along with the CME (1) to form a joint system involving the CME solution pX (t, θ) and its partial derivatives sX,(t, θ) ≡ ∂pX (t, θ)/∂θ, given by

ddtpXt,θsX,1t,θsX,dt,θ=At,θ00θ1At,θAt,θ0θdAt,θ0At,θpXt,θsX,1t,θsX,dt,θ.

This forward sensitivity system can be solved numerically with any standard ODE solver. When the state space is infinite, a truncation algorithm based on extending the Finite State Projection (Fox and Munsky, 2019) can be applied to approximately solve the forward sensitivity system (see Supplementary Data Sheet S1, Section 3.1 for more details).

Knowing the sensitivity of the distribution to parameter changes then allows us to compute the sensitivity of the log-likelihood function with respect to biophysical parameters θ1, … , θd using the formula

θlogLθ|D=i=1NcsX,t,ci,θpXt,ci,θ,

where sX,(t, x, θ) ≡ ∂pX (t, x, θ)/∂θ is the sensitivity for the specific molecular counts x at time t.

2.1.3 Modeling distortion of measurements

Let y(t) be the multivariate measurement made on a single cell at time t, such as the discrete number of spots in an smFISH experiment, or the total fluorescence intensity, such as from a flow cytometry experiment. Because of random measurement noise, y(t) is the realization of a random vector Y(t) that is the result of a random distortion of the true process X(t). The probability mass (density) vector (function) pY(t) of the discrete (continuous) observable measurement Y(t) is related to that of the true copy number distribution via a linear transformation of the form

pYt,θ=CXYt,θpXt,θ.(3)

Mathematically, CXY(t, θ) functions as a Markov kernel, and we shall call it the Probabilistic Distortion Operator (PDO) to emphasize the context in which it arises. Considered as a matrix whose rows are indexed by all possible observations y and whose columns are indexed by the CME states x, it is given entry-wise as

CXYt,θy,xPrYt=y|Xt=x,θ,(4)

where Pr stands for probability mass if Y is discrete or probability density if Y is continuous. Together, Eqs 1, 3 describe single-cell measurements as the time-varying outputs of a linear dynamical system on the space of probability distributions on the lattice of N-dimensional discrete copy number vectors. The output matrix of this dynamical system is the PDO CXY. If the observations c1,,cNc in the dataset D are assumed to be distorted according to the PDO C, then the log-likelihood function (2) is changed into

logLθ|D=i=1NclogpYti,ci,θ(5)

where pY (t, y, θ) are point-wise probabilities of the distribution pY (t, θ) defined in (3).

There are many different ways to specify the PDO, depending on the specifics of the measurement method one wishes to model. In this paper, we demonstrate various examples where the PDO is formulated as probabilistic models that use simple distributions as building blocks (see Results and Supplementary Data Sheet S1, Section 3.2), or as deterministic binning/aggregation (Results and Supplementary Data Sheet S1, Section 3.2). One could even use a secondary CME to model the uncertain chemical kinetics of the measurement process, such as the random time needed to achieve chemical fixation in smFISH experiments or the dropout and amplification of mRNA that occurs during single-cell RNAseq experiments (Supplementary Data Sheet S1, Section 3.3), or distribution convolution to describe cell segmentation noise (Supplementary Data Sheet S1, Section 3.4). Despite the variety of ways these measurement noise models can be derived, they all lead to the same mathematical object (i.e., the PDO), which allows computation of the FIM associated with that particular noisy single-cell observation approach, and the effects of all PDO can be analyzed using the same computational procedure as we describe next.

2.1.4 Computation of the Fisher Information matrix for distorted experimental measurements

In practice, when closed-form solutions to the CME do not exist, a forward sensitivity analysis using an extension of the finite state projection algorithm can be used to evaluate the probability distribution pX (t, θ) and its partial derivatives θjpX(t,θ) with respect to the kinetic parameters (Fox and Munsky (2019). Using Eq. 3, we can transform these into the distribution pY (t, θ) of Y(t). Furthermore, the sensitivity indices sYj(t,θ)=θjpY(t,θ), of the observable Y are computable by back-propagating the sensitivities of the noise-free measurement distributions through the PDO,

θjpYt,θ=CXYt,θθjpXt,θ+θjCXYt,θpXt,θ.(6)

Then, the Fisher Information matrix (FIM) FY(t)(θ) of the noisy measurements Y(t) at time t is computed by

FYtθi,j=ysYit,y,θsYjt,y,θpYt,y,θ.(7)

Details of the numerical approximation can be found in Supplementary Data Sheet S1 Section 3.1. In this formulation, we only need to solve the (usually expensive) sensitivity equations derived from the CME once, then apply relatively quick linear algebra operations to find the FIM corresponding to any new PDO for specific microscope, fixation protocol, or probe designs.

To convert the FIM between parameters defined in linear space to the same parameters in logarithmic space, we apply the transformation:

FYtlogθi,j=θiθjFYtθi,j.(8)

Computation of the distorted distributions and FIMs are performed in Python using the NumPy library (complete codes available at https://doi.org/10.5281/zenodo.7896003) for the first set of results (Figures 24; Supplementary Figures S1–S10) or using Matlab (complete codes available at https://doi.org/10.5281/zenodo.7864743) for the second half of results (Figures 58; Supplementary Data Sheet S1, Supplementary Figures S11, S12). Details of computer hardware and software are provided in Supplementary Data Sheet S1.

FIGURE 2
www.frontiersin.org

FIGURE 2. Probabilistic distortion operators. (A) The “Missing Spots” (MS) distortion model, where spots can randomly go missing. (B) The “Missing Spots with Variable Rate” (MSVR) distortion, where the probability of missing a spot increases with spot density. (C) The “Poisson Noise” (PN) model, where false positive spots are added to the counted number of spots. (D) A “Poisson Observation” (PO) model, where the detected spots follow a Poisson distribution with mean proportional to the true number. (E) The “Integrated Intensity” (II) model, where only a perturbed version of the total fluorescence is recorded per cell. In each heat map, the color at point (x̃,ỹ) is the conditional probability mass/density of the measurement y having value ỹ given that the true copy number x has value x̃.

2.1.5 Parameter estimation and uncertainty quantification

Models are fit to experimental data in Matlab [R2021b, Inc. (2022)], using an iterated combination of the builtin fminsearch algorithm [an implementation of the Neldar-Mead simplex method, Lagarias et al. (1998)] to get close to the MLE (i.e., to maximize the likelihood in Eq. 5) followed by a customized version of the Metropolis Hastings (MH) sampling routine, mhsample Chib and Greenberg (1995). Chemical master equation models are define using sparse tensors utilizing the Tensor Toolbox for Matlab Bader and Kolda (2008). All parameter searches were conducted in logarithmic space, and model priors were defined as lognormal with log-means and standard deviations as described in the main text. For the MH sampling proposal function, we used a (symmetric) multivariate Gaussian distribution centered at the current parameter set and with a covariance matrix proportional to the inverse of the Fisher Information matrix (calculated at the MLE parameter set) and scaled by a factor of 0.8, which achieves an approximately 20%–50% proposal acceptance rate for all combinations of PDOs and data sets. MH chains were run for 20,000 samples. Convergence was checked by computing the autocorrelation function and verifying that the effective sample size was at least 1,000 for every parameter in every MH chain. All data, model construction, FSP analysis, FIM calculation, parameter estimation, and visualization tasks were performed using a preliminary version of the Stochastic System Identification Toolkit (SSIT), and full codes are available at https://doi.org/10.5281/zenodo.7893296.

2.2 Single-cell labeling and imaging

2.2.1 Cell culture

The experiments presented here were performed on Hela Flp-in H9 cells (H-128). The generation of the H-128 cell line has previously been discussed (Tantale et al., 2016). Briefly, Tat expression regulates the MS2X128 cassette-tagged HIV-1 reporter gene in H-128 cells. The HIV-1 reporter consists of the 5′and 3′long terminal repeats, the polyA sites, the viral promoter, the SD1 and SA7 splice donors, and the Rev-responsive element. Additionally, the MS2 coating protein conjugated with a green fluorescent protein (MCP-GFP), which binds to MS2 stem loops when transcribed, is expressed persistently by H-128 cells. Cells were cultured in Dulbecco’s modified Eagle medium (DMEM, Thermo Fisher Scientific, 11,960–044) supplemented with 10% fetal bovine serum (FBS, Atlas Biologicals, F-0050-A), 10 U/mL penicillin/streptomycin (P/S, Thermo Fisher Scientific, 15140122), 1 mM L-glutamine (L-glut, Thermo Fisher Scientific, 25030081), and 150 μg/mL Hygromycin (Gold Biotechnology, H-270-1) in a humidified incubator at 37°C with 5% CO2.

2.2.2 smiFISH and microscopy

Single-molecule inexpensive fluorescence in situ hybridization (smiFISH) was performed following a protocol previously described (Tsanov et al., 2016; Haimovich and Gerst, 2018). This technique is known as inexpensive, because the primary probes consist of a region binding the transcript of interest plus a common sequence established as FLAP, which is bound by a complementary FLAP sequence conjugated with a fluorescent dye following a short PCR-cyle (the primary and FLAP-Y-Cy5 probes used in this study were purchased from IDT, see Table 1). To perform smiFISH, H-128 cells were plated on 18 mm cover glasses within a 12-well plate (105 cells/well), 24 h before the experiment. Some samples were exposed to 5 μM triptolide (Sigma-Aldrich, 645,900) for different incubation periods. Immediately after these drug treatment periods, samples were washed out twice with RNAse free 1XPBS, and fixed in 4% PFA at RT for 10 min, followed by 70% ethanol permeabilization at 4°C for least 1 h. After washing each sample with 150 μL of wash A buffer (Biosearch Technologies, SMF-WA1-60) for 5 min, each cover glass was set on a droplet (cells facing down) consisting of 45 μL hybridization buffer (Biosearch Technologies, SMF-HB1-10) and 1 μL of the duplex smiFISH probes (MS2-transcript-binding probe mix + FLAP-Y-binding region annealed to FLAP-Y-Cy5) in a humidified chamber at 37°C overnight. The following day, samples were placed in a fresh 12-well plate, and the cells (facing up) were incubated twice in wash A buffer at 37°C for 30 min, first alone, and then containing DAPI. Finally, cells were incubated with wash B buffer (Biosearch Technologies, SMF-WB1-20) at RT for 5 min, and then mounted on a 15 μL drop of Vectashield mounting medium (Vector Laboratories, H-1000-10), and sealed with transparent nail polish.

TABLE 1
www.frontiersin.org

TABLE 1. smiFISH probe sequences. Each primary probe has an added common FLAP-Y binding sequence (TTA​CAC​TCG​GAC​CTC​GTC​GAC​ATG​CAT​T).

Fluorescent images were acquired with an Olympus IX81 inverted spinning disk confocal (CSU22 head with quad dichroic and additional emission filter wheel to eliminate spectral crossover) microscope with 60x/1.42 NA oil immersion objective. Confocal z-stacks (0.5 μm step-size, 27 stacks in each channel) were collected. Each field of view was imaged using four high-power diode lasers with rapid (microsecond) switcher (405 nm for DAPI, 488 nm for MS2-MCP-GFP reporter, 561 nm for cytosol marker, and 647 nm for smiFISH MS2-Cy5, exposure time of 100 ms for all except smiFISH channel that was 300 ms) when samples had cytosol marker or three lasers for samples without the marker. The system has differential interference contrast (DIC) optics, built-in correction for spherical aberration for all objectives, and a wide-field Xenon light source. We used an EMCCD camera (iXon Ultra 888, Andor) integrated for image capture using Slide book software (generating 60× images with 160 nm/pixel). The imaging size was set to 624 × 928 pixels2. All raw images are available at https://doi.org/10.5281/zenodo.7868322.

2.2.3 smiFISH image processing

In this study, we employed Python to implement an image processing pipeline comprising three steps: cell segmentation, spot detection, and data management [as previously described in Safieddine et al. (2022)]. Nuclear segmentation on the DAPI channel (405 nm) was carried out using Cellpose (Stringer et al., 2021) with a 70-pixel diameter as an input parameter. Spot quantification for both the MS2-MCP-GFP reporter channel (488 nm) and the smiFISH MS2-Cy5 channel (647 nm) was performed independently, using BIG-FISH software (Imbert et al., 2022). The spot quantification procedure employed a voxel-XY of 160 nm, a voxel-Z of 500 nm, and spot radius dimensions of 160 nm and 350 nm in the XY and Z planes, respectively. This process included the following steps: i) the original image was filtered using a Laplacian of Gaussian filter to improve spot detection. ii) Local maxima in the filtered image were then identified. iii) To distinguish between genuine spots and background noise, we implemented both an automated thresholding strategy and a manual approach involving the use of multiple threshold values between 400 and 550. The intensity thresholds (550 for MS2-MCP-GFP and 400 for smiFISH MS2-Cy5) that resulted in the highest number of co-detected spots in both channels were chosen. Additionally, we quantified the average nuclear intensity for each color channel by computing the mean intensity of all pixels within the segmented nuclear region. Data management involved organizing all quantification data into a single dataset, including information on the specific image and cell in which the spots were detected. All image processing codes (Aguilera et al., 2023) are available from https://doi.org/10.5281/zenodo.7864264.

3 Results

Many models have been used to capture and predict observations of single-cell heterogeneity in gene expression (Munsky et al., 2009; Skupsky et al., 2010; Lou et al., 2012; Earnest et al., 2013; Neuert et al., 2013; Sepúlveda et al., 2016; Skinner et al., 2016). When selecting an experimental assay to parameterize such models, one is faced with several choices, each with its own characteristic measurement errors (Raj and van Oudenaarden, 2009). Here, we start by introducing several mathematical forms for probabilistic distortion operators (PDOs) that can quantify these measurement errors. We then use a model and simulated data to show how different measurement errors can affect model identification, and we show how this can be corrected through consideration of the PDO in the estimation process. Next, we show how models and PDOs can be used in the framework of Fisher Information in iterative design of single-cell experiments for efficient identification of predictive models. Finally, we illustrate the practical use of the PDO, model inference, and FIM based experiment design on the experimental investigation of bursting gene expression from a reporter gene controlled by an HIV-1 promoter.

3.1 Distorted single-cell measurements sample a probability distribution that is the image of their true molecular count distribution through a linear operator

Most parameters needed to define single-cell signaling or gene expression models cannot be measured directly or calculated from first principles. Instead, these must be statistically inferred from datasets collected using single-molecule, single-cell experimental methods such as smFISH (Raj et al., 2006; Gómez-Schiavon et al., 2017), flow cytometry (Lim and van Oudenaarden, 2007; Lipinski-Kruszka et al., 2015), or live-cell imaging (Suter et al., 2011; Fukaya et al., 2016; Forero-Quintero et al., 2021). In this work, we focus on the former two experimental approaches in which collected data consists of independent single-cell measurements taken at different times. Here, we consider measurement distortion effects corresponding to probe binding inefficiency and spot detection for smiFISH experiments, and in the Supplementary Material, we extend this to consider effects of reporter fluorescence intensity variability in flow cytometry experiments (Supplementary Data Sheet S1, Section 3.1), data binning (Supplementary Data Sheet S1, Section 3.2), effects of competition with non-specific probe targets (Supplementary Data Sheet S1, Section 3.3), and effects of segmentation errors (Supplementary Data Sheet S1, Section 3.4).

We first consider five formulations to define the measurement distortion matrix (cf. Eq. 4 in Methods, and illustrated in Figure 2), corresponding to scenarios in which experimental errors arise from either inefficient mRNA detection, additive false positives, or combinations. Specifically.

1. The first model supposes that ỹ is obtained from a “lossy” spot-counting process applied on images taken in an smFISH experiment. We model y|x = j with a binomial distribution B (pdetect, j), where each spot has a chance pmiss ≔1 − pdetect of being ignored by the counting algorithm, resulting in underestimation of the true mRNA copy number. In the context of optical microscopy, such a distortion might result from quantifying spots at a single plane, where pdetect might represent the fraction of the imaged section compared to the full volume of the cell, but similar error models have also been proposed in the context of single-cell RNA sequencing (Larsson et al., 2019; Breda et al., 2021). We call this distortion “Missing Spots.” Its PDO, CMS is illustrated in Figure 2A for pmiss ≔0.5 and can be defined:

CMSi,j=ji1pmissipmissji.

2. The second model is a simple variation of the first model in which pdetect(j) varies with the number of mRNA molecules j. For example, the specific formulation, pdetect ≔1.0/(1.0 + 0.01j) implies that spot detection rate degrades as the number of mRNA molecules in the cell increases, which may correspond to the effect of co-localization and/or image pixelation which could cause the under-counting of overlapping spots. We call this model “Missing Spots with Varying Rate,” and CMSVR is illustrated in Figure 2B and can be expressed:

CMSVRi,j=jipdetectji1pdetectjji.

3. The third model assumes that ỹ is the output of a spot-detection process contaminated by false positives, e.g., due to background fluorescence noise in the image that can appear to be spots. We model these false positives by additive Poisson noise, making ỹ=j+ẽ where ẽPoisson(λ). We call this model “Poison Noise,” and CPN, which is illustrated in Figure 2C for λ = 10 and can be expressed:

CPNi,j=λjiexpλji! for ij and otherwise zero

4. The fourth model is a simple extension of the third model in which the number of detected spots given a true number, j, is a Poisson distribution with a varying mean λ(j). Specifically, λ(j)max0,λ0+λ1j, where λ0 is Poisson noise, and λ1 is the Poisson detection rate. We call this model “Poisson Observation,” and CPO is illustrated in Figure 2D and expressed as:

CPOi,j=λjiexpλji!.

5. The fifth model concerns fluorescent intensity integration measurements (such as those used in flow cytometry). We use the model proposed in (Munsky et al., 2009), in which ỹ=κj+η̃j+ε̃BG, where η̃jN(0,jσprobe2) models fluorescent heterogeneity and ε̃BGN(μBG,σBG2) is the background noise. The PDO is a hybrid matrix with discrete columns (i.e., its domain are probability vectors over discrete CME states) and continuous rows (its range consists of continuous probability density functions over the range of fluorescence intensities). This PDO, which we label “Integrated Intensity” and is illustrated in Figure 2E for μBG = 200, σBG = 400, κ=σprobe2=25, can be expressed:

CIIy|j=12πσBG2+jσprobe2expyκj22σBG2+jσprobe2.

For intensity measurement with finite resolution, CII can be defined over discrete bins (e.g., (y0, y1], (y1, y2], …) by integrating as follows:

CIIi,j=yi1yi12πσBG2+jσprobe2expyκj22σBG2+jσprobe2dy.

We stress that these PDOs are provided as just a few of many possible distortions that could be modeled using the proposed framework. Other, more complex distortion operators are discussed in Supplementary Data Sheet S1, Section 3, including one where a secondary CME is employed to model the uncertain chemical kinetics of the measurement process (Supplementary Data Sheet S1, Section 3.3) and another to describe cell merging due to image segmentation errors (Supplementary Data Sheet S1, Section 3.4).

3.1.1 Measurement noise introduces bias and uncertainty into model identification

To illustrate the impact of measurement distortion on parameter estimation and our use of the PDO formalism to mitigate these effects, we begin with an analysis of the random telegraph model (Figure 3A), one of the simplest, yet most commonly utilized models of bursty gene expression (Peccoud and Ycart, 1995; Raj et al., 2006; Suter et al., 2011; Sanchez and Golding, 2013; Senecal et al., 2014; Larsson et al., 2019). In this model, a gene is either in the inactive or active state, with transition between states occurring randomly with average rates kON (to activate the gene) and kOFF (to deactivate the gene). When active, the gene can be transcribed with an average rate kr to produce mRNA molecules, each of which degrades with an average rate γ.

FIGURE 3
www.frontiersin.org

FIGURE 3. Estimating and correcting for how measurement distortion affects model identification. (A): Schematic of the random telegraph gene expression model. Parameter values: gene activation rate kON = 0.015 events/minute, gene deactivation rate kOFF = 0.05 events/min, transcription rate kr = 5 molecules/minute, degradation rate γ = 0.05 molecules/minute. (B): A submatrix of the PDO for missing spots with varying rates, restricted to the domain {0, … , 200}×{0, … , 100}. (C,D): Maximum likelihood fits to validate FIM-based uncertainty quantification for observed mRNA distributions under distortion model shown in (B). We simulated 1,000 datasets and perform maximum likelihood fits to these datasets using either a likelihood function that ignores measurement noise (red, labeled “MLE with uncorrected likelihood”), or one whose measurement noise is corrected by incorporating the PDO (dark green, labeled “MLE-PDO fits”). The estimated density contours (delineating 10, 50, and 90 -percentile regions) of the fits are superimposed in light shades. Also displayed are the three-sigma confidence ellipses computed by the FIM-PDO approach or from sample covariance matrix of the corrected MLE fits. Panel (C) shows the results in log10 (kOFF) − log10 (kON) plane, while panel (D) shows them in the log10 (kr) − log10(γ) plane. The intersection of the thick horizontal and vertical lines marks the location of the true data-generating parameters. See Table 2 for a quantitative comparison between the uncorrected MLE and MLE-PDO.

To demonstrate how measurement distortions affect parameter identification, and why explicit measurement error modeling is necessary, we use the bursting gene expression model to simulate mRNA expression data where each cell is distorted by the MSVR effect above (PDO reproduced in Figure 3B). Each data set consists of five batches of 1,000 independent single-cell measurements that are collected at five equally-spaced time points jΔt, j = 1, 2, 3, 4, 5 with Δt = 30 min. We considered two methods to fit the telegraph model to these data based on the Maximum Likelihood Estimator (MLE): one in which the likelihood function ignores measurement noise, and one where measurement noise modeled by the PDO is incorporated into the likelihood function (see Supplementary Data Sheet S1, Section 2 for their formulations). If one fails to account for measurement uncertainty, these fits produce strongly biased estimates for the RNA production and degradation rates as seen in Figures 3C, D (red). Because this bias is inherent to the measurement technique, it cannot be corrected simply by averaging over more experiments. On the other hand, using the distortion correction method, Figures 3C, D (dark green) shows that the inaccurate spot counting procedure can be corrected by explicitly accounting for measurement uncertainty in the modeling phase. Quantitatively (see Table 2), the MLE fits that incorporate noise modeling have lower bias (in terms of relative root-mean-squared errors) for all four parameters compared to the uncorrected fits.

TABLE 2
www.frontiersin.org

TABLE 2. Performance of the maximum likelihood estimator (MLE) for estimating bursting transcription kinetic parameters. The third and fourth columns compare the mean and standard deviation of fits with and without PDO correction (labeled Corrected and Uncorrected, respectively). The final two columns compare the relative root-mean-squared errors (RMSEs) of these fits. For a quantity of interest q and its n estimated values q̂i,i=1,,n, we define the relative RMSE as i=1n(q̂iq)2|q|.

Figures 3C, D also demonstrate that the Fisher Information Matrix computed for the noisy measurement (see Eq. 7) produces a close approximation to the covariance matrix of the MLE (compare magenta and black ellipses). This suggests that the FIM can provide a relatively inexpensive means to quantify the magnitude and direction of parameter uncertainty, a fact that will be helpful in designing experiments that can reduce this uncertainty, as we will explore next.

3.1.2 Fisher Information matrix analysis reveals how optimal experiment design can change in response to different measurement distortions

Having demonstrated the close proximity between our FIM computation and MLE uncertainty for simulated analyses of the bursting gene expression model, we next ask whether the sampling period Δt could be tuned to increase information but using the same number of measurements. Recall that our simulated experimental set-up is such that measurements could be placed at five uniformly-spaced time points tj = jΔt, j = 1, 2, 3, 4, 5, with the sampling period Δt in minutes, and that at each time point we collect an equal number n of single-cell measurements, chosen as n ≔1,000. We find the optimal sampling period Δt for each measurement distortion (MS, MSVR, PN, PO, and II), and compare the most informative design that can be achieved for each class. Here, we define “optimal” in terms of the determinant of the Fisher Information Matrix, the so-called D-optimality criterion, whose inverse estimates the volume of the parameter uncertainty ellipsoid for maximum likelihood estimation (Atkinson and Donev, 1992). Figures 4A, B shows the information volume to the five kinds of noisy measurement described above (Figure 2), in addition to the ideal noise-free smiFISH, at different sampling rates. We observe that every probabilistic distortion to the measurement decreases the information volume (but to different extents), and that each measurement method results in a different optimal sampling rate. In Figures 4C, D, we plot the three-sigma (i.e., 99.7%) confidence ellipsoids of the asymptotic distribution of MLEs projected on the log10 (kON) − log10 (kOFF) plane and the log10 (kr) − log10(γ) plane.

FIGURE 4
www.frontiersin.org

FIGURE 4. Optimizing experiment sampling rate under different measurement distortion effects. (A): Comparison of D-optimality criteria in single-cell experiments with different types of measurement noise and at different sampling intervals (Δt). In this settings, independent measurements are collected at five equally-spaced time points kΔt, k = 1, 2, 3, 4, 5 with 1,000 measurements placed at each time-point. The ⋆ symbol marks the optimal point of each curve and the bar charts in (B) visualizes the relative differences in D-optimal achievable by the measurement methods at their respective optimal sampling rates. (C,D): The three-sigma confidence ellipses projected onto the log10 (kON) − log10 (kOFF) and log10 (kr) − log10(γ) planes. These ellipses are computed by inverting the FIMs of different measurement noise conditions at their optimal sampling rates. All analyses use the model and parameters from Figure 3.

3.2 FIM and PDO analysis of experimental measurements for HIV-1 promoter bursting kinetics

To provide a concrete example for the use of the FIM and PDO in practice, we performed single-cell measurements to quantify the relative measurement distortion between different single-mRNA labeling strategies in HeLa (H-128) cells (see Methods). We expressed a transcription reporter gene with 128 repeats of the MS2 hairpin, and we simultaneously used both MCP-GFP (green labels in Figure 5A) and smiFISH MS2-Cy5 (magenta labels in Figure 5A) to target the MS2 repeats. In each cell, both approaches detect similar patterns for the number and spatial locations of mRNA within the nuclei (Figures 5B–G). For our particular choice of image processing algorithm (Stringer et al., 2021; Imbert et al., 2022; Safieddine et al., 2022) and intensity threshold for spot detection (see Methods) and a two-pixel (x,y,z) Euclidean distance threshold for co-localization detection, after analyzing 135 cells in steady-state conditions, we found that 46.9% of mRNA spots (15,258 out of 32,526 total) were detected in both channels (e.g., spots denoted by white triangles in Figure 5A). However, many spots (21.3%, 6,913 spots) are detected only using smiFISH (e.g., those denoted with magenta triangles) and 31.8% (10,355 spots) are only detected using the MS2-MCP labels (e.g., green triangles). Due to differences in label chemistry, background fluorescence, and image analysis errors, the quantified distribution of mRNA expression depends heavily upon which assay is utilized (compare Figures 5C–H). For example, we observed that analyses of cells that have fewer than 10 spots in the smiFISH channel frequently result in large numbers of spurious spots when the same cells are analyzed in the MCP-GFP channel (black markers on left limit of Figure 5C and high density at zero in Figures 5D, E). These differences in measurement quantification support the need for consideration of measurement distortions in subsequent analysis.

FIGURE 5
www.frontiersin.org

FIGURE 5. Effect of labeling strategy on quantification of single-cell mRNA expression. (A) Example image of a single-cell population expressing a reporter gene controlled by an HIV-1 promoter and containing 128XMS2 stem-loop cassette, in which the mRNA was simultaneously measured using MCP-GFP labeling (green) and with smiFISH probes against MS2-Cy5 (magenta). A higher resolution image of the indicated single nucleus is shown at the top; merged image on the right includes DAPI and MemBrite cell surface stain 543/560 to denote the cell nucleus and plasma membrane, respectively. Triangles denote example mRNA that are detected in both channels (white, 34.8%), only in the MCP-GFP channel (green, 37.5%) and only in smiFISH (MS2-Cy5) channel (magenta, 27.7%). (B) BIC to compare different combinations of PDO (columns) and measurement type (rows) given an assumed “true” measurement of smiFISH mRNA. In all cases, PDO parameters are chosen to maximize likelihood for t = 0 and 300 min data, and t = 18 min is predicted without chanign parameters. Blue shading denotes PDO selection is identical if based on BIC for (0,300) min data or prediction of 18 min data. (C) Scatter plot of the spot count using MCP-GFP versus using smiFISH for data collected at t = 0 min (black squares). Shading and contour lines denote the levels of the PDO (log10C) that has been determined empirically from the data. (D,E) Empirical probability mass (D, bin size = 20) and cumulative distributions (E, no binning) for number of detected smiFISH spots detection (red) and number of detected MCP-GFP spots (blue). Predicted MCP-GFP spot distributions using the smiFISH spot measurements and the estimated PDO are shown in green. (F–H) Same as (C–E) but where the distorted measurement is the total integrated intensity of the smiFISH channel. (I–K) Same as (C–E) but where the distorted measurement is the total integrated intensity of the GFP channel.

3.2.1 PDO measurement noise parameters can be calibrated using single-cell experiments with multiple measurement modalities

To demonstrate the parameterization and selection of a PDO for these data, we measured expression using smiFISH or MCP-GFP spot counts as well as with total integrated fluorescence intensity in the FISH or GFP channels. These measurements were collected for 135 cells at t = 0 and 62 cells at t = 300 min after transcription deactivation by 5 μM Trp. We then defined the detected smiFISH spots as the “true” measurements {x1,,xNc} and considered the MCP-GFP spot counts, FISH intensity, and GFP intensity values as three different groups of distorted observations {y1,,yNc}. We then assumed three possible PDO formulations for each measurement modality, including the simple “Poisson Observation” (PO) and “Integrated Intensity” (II) PDOs from above, as well as an extended “Spurious Gaussian” (SG) PDO that is formulated starting with the II model but then extended to allow for a random fraction (f) of cells with a true spot count of 10 or fewer cells to be miscounted as a random observation drawn from a Gaussian (iN(μ,σ2)) distribution. For all PDOs, we assumed that distorted quantities would be rounded to their nearest non-negative integer value (e.g., negative values would be rounded up to zero). For each PDO and its corresponding parameter set Λ, we calculate the corresponding PDO, C(Λ), and the log-likelihood to observe {yi} given {xi} is computed as

logLyi|C,xi=i=1NclogCΛi=yi,j=xi.

We then independently maximize this likelihood function for each combination of the three different distorted data sets (MCP-GFP spots, FISH intensity, GFP intensity) and for the three PDO formulations (PO, II, SG). For each data set, we finally select the PDO formulation that results in lowest Bayesian Information Criteria (BIC ≡ k log (Nc) − 2 log  L, where k is the number of parameters in the PDO) for the t = 0 and t = 300 min data (Figure 6B). We also verified in all cases that the PDO selection also maximized the likelihood of the predictions for held-out data at t = 18 min after Tpt treatment (Figure 6B). Upon fitting and selection based on either BIC or cross-validation, we found that MCP-GFP spot count data and GFP intensity distortions were best represented by the “Spurious Gaussian” PDO (CSG), while the FISH intensity measurements were best represented by the “Integrated Intensity” PDO (CII).

FIGURE 6
www.frontiersin.org

FIGURE 6. Validation of PDO on held out data. (A) Scatter plot of smiFISH spot counts and MCP-GFP spot counts for data collected at t = 18 min and contour of PDO determined from data at t = (0, 300) min. (B,C) PDF (bin size = 20) and CDF for smiFISH mRNA detection (red) or with MCP-GFP (blue) and MCP-GFP-based PDO-prediction of total mRNA (green). (D–F) Same format as (B–D) but for measurements of the total smiFISH fluorescence intensity per cell. (G–I) Same format as (B–D) but for measurements of the total GFP fluorescence intensity per cell.

Figures 5C, F show the contours of the corresponding PDOs for the MCP-GFP spot and FISH intensity data, respectively, and Figures 5D, E, G, H show the PDF and CDF for the “true” smiFISH mRNA count data at t = 0 (blue lines) compared to the distorted data (red lines). From the figures, we find the distortion model does a good job to calibrate between the total mRNA and MCP-GFP- or smiFISH-detected spot counts (compare blue and green lines in Figures 5D, E, G, H and the BIC values for (0,300) min in Figure 5B). We next verified that the PDO remains constant by showing that the same models and same parameters also accurately reproduce the difference in total mRNA and MCP-GFP or smiFISH measurements at a held out time of 18 min after application of 5 μM Trp (see Figure 6, and BIC values for 18 min in Figure 5B).

3.2.2 The PDO allows estimation of predictive bursting gene expression model parameters from distorted data

We next asked if using the estimated PDO while fitting the MCP-GFP spot data or the total FISH or GFP intensity data would enable the identification of appropriate model parameters to predict the smiFISH mRNA counts. Based on previous observations (Forero-Quintero et al., 2021), we proposed a 3-state bursting gene expression model (Figure 7A) where each of two alleles can occupy one of three states: S1 = OFF, where no transcription can occur; S2 = Poised, where the promoter is ready to begin transcription; and S3 = Active, where transcripts are produced in rapid bursts of mRNA expression. The model contains six parameters: kON and kOFF are the promoter transition rates between the OFF and Poised states; ω and kEx are the burst frequency and burst exit rates; and kr and γ are the transcription and mRNA degradation rates, respectively. Based on previous observation (Forero-Quintero et al., 2021) that triptolide (Trp) represses transcription after an average of 5–10 min needed for diffusion of Trp to the promoter and completion of nascent mRNA elongation and processing, we model the Trp response as a complete inactivation of transcription (ω → 0) that occurs at t = 5 min.

FIGURE 7
www.frontiersin.org

FIGURE 7. Identification of stochastic model for MS2X128 cassette-tagged HIV-1 reporter gene. (A) Schematic of the 3-state bursting gene expression model. (B) Results for model fitting, prediction, and uncertainty quantification for measurements based smiFISH spots (top row), MCP-GFP spots (row 2), total FISH intensities (row 3) and GFP intensities (row 4). Left two columns show the measured and model-fitted probability mass vectors (PMV) at 0 and 300 min after 5 μM Tpt. Third column shows the model-predicted and measured PMV for the corresponding (distorted) measurement modality at 18 min after 5 μM Tpt. Fourth column shows the model prediction without distortion and measured PMV for the smFISH mRNA count at 18 min after 5 μM Tpt. All histograms use a bin size of 20. Log-likelihood values for all model-data comparisons (and BIC values for fitting cases, k = 4 parameters, N = 197 cells) are computed without binning and are shown below the corresponding histograms. Right two columns show joint parameter uncertainty for model estimation using data for 0 and 300 min after 5 μM Tpt. In each case, the 90% CI for prior is shown in cyan; Metropolis Hastings samples (N = 20,000) are shown in dots; 90% CI for posterior is shown in dashed magenta; and FIM-based estimate of 90% CI is shown in black. Horizontal and vertical dashed black lines denote the “true” parameters and are defined as the MLE when using fit to the smFISH counts and using all time points. Determinant of inverse FIM and covariance of MH samples is shown below each pair of uncertainty plots (both use log base 10).

To specify a prior guess for the model parameters, we considered published values from (Forero-Quintero et al., 2021), where we used live-cell imaging of ON and Poised transcription sites to determine the burst frequency ω ≈ 0.2 min−1 and that kr and kEx are too fast to be estimated independently, but are related by a burst size of kr/kEx = β ≈ 7.1 mRNA/burst. Also, based on observations in (Forero-Quintero et al., 2021) that transcription sites remain in the OFF or Poised/Active states for long periods of 200 min or more, we assumed that kON and kOFF would be too slow to estimate except under much longer experiments. We therefore estimated kON = kOFF = 10–4 min−1, but we sought only to estimate the relative rate for kOFF. We estimated a typical mammalian mRNA half-life of 60 min yielding a rate γ = 5.8 × 10−3 min−1. With these baseline values as rough estimates, we assumed a log-normal prior distribution with a standard log deviation of one order of magnitude from the literature-based values for parameters {ω, β, γ} and two orders of magnitude for the more approximate value for kOFF}. These initial parameter guesses and prior uncertainties are summarized in Table 3 (column 3).

TABLE 3
www.frontiersin.org

TABLE 3. Parameter priors, MLE estimates, Uncertainties upon Initial Fit. Initial estimates after fitting to data at t =(0,300) min. MH results are from a chain of 20,000 samples. All parameter values and standard deviations values are shown in log10.

Having specified parameter priors, we then took single-cell data for 135 cells at steady state (t = 0) and 62 cells at t = 300 min after application of 5 μM Trp, and we applied the Metropolis-Hasting (MH) algorithm (20,000 samples) to estimate rates and uncertainties for the four free parameters. As above, we assumed that the total spot count analysis provided the “true” spot count, and we considered four estimation problems using either the “true” smiFISH mRNA counts, the MCP-GFP spot count data, the FISH intensity data, or the GFP intensity data, each using the empirically estimated distortion operators from before (e.g., Figures 5C, F, I for the MCP-GFP spots, smiFISH intensity, and GFP intensity data, respectively). Table 3 presents the MLE parameter values after this initial stage, and Figure 7B (left two columns) compare the resulting fits of the model to the data. Figure 7B (right columns) shows scatter plots of parameter uncertainties for the parameter identification using either the smiFISH data (top row) or with the three distortion measurements using the empirical distortion operator (bottom three rows). From Figure 7B; Table 3, one can see that all fits do a reasonable job to match their intended data, and when fit using the PDO parameters, model predictions based only on the smiFISH and MCP-GFP data match well to the total spot count analysis (Figure 7B, middle column).

3.2.3 The PDO-corrected FIM accurately estimates parameter uncertainty after experimental analysis of the bursting gene expression model

We next used the maximum likelihood models when fit to the 0 and 300 min data to compute the FIM for the corrected MCP-GFP and smiFISH measurements. Because the parameters cover multiple orders of magnitude, we transform the FIM to consider the parameters in log10 space (Eq. 8). To adjust the FIM to consider the case where there is a prior on the parameters, we add the inverse of the prior covariance (in logspace) to the calculated FIM: FIMtotal=FIM+Σprior1. Given that we only collected a single experimental data set for each time point, it is not possible to directly compare the FIM to the spread of MLE estimates like we could for simulated data. However, in Figure 7B (right columns) for the smiFISH mRNA count and PDO-corrected MCP-GFP spots/intensity or smiFISH intensity data, we can compare the uncertainty predicted by the FIM analysis to the posterior uncertainty of our parameters given our data. This comparison shows that in most cases the FIM does an excellent job to estimate the direction and magnitude of parameter uncertainties (compare overlapping black and magenta ellipses in Figure 7B and see Table 3 for direct comparison of estimated standard deviations). However, it is important to note an exception where the FIM prediction does not match the MH analysis for the largest distortion (GFP intensity measurements). In this case, the FIM predicted variance is much larger than for the other cases [note the change in scales in Figure 7B (bottom right)], but the posterior found by the MH analysis is clearly non-Gaussian. As we will see in the next section, the reason for this failure is likely that the (0,300) min experiment design with this distortion provides insufficient information to identify the model.

3.2.4 PDO-corrected FIM analysis accurately ranks designs for most informative transcription-repression experiment

To demonstrate the practical use of FIM for experiment design, we next asked what design for a Trp-based transcription repression experiment would be best to improve our model of mRNA expression identified in Figure 7. We restricted the set of possible experiments to the previous data set (135 cells at t1 = 0 min and 62 cells at t2 = 300 min) plus an additional set of 100 cells at a new time t3 after Trp application, where t3 could be any time in the allowable set t3 ∈ [0, 6, 12, 18, … , 1,200] min.

We drew 20 random parameter samples from the previous 20,000-sample MH chains that were estimated using data at t1 = 0 and t2 = 300 min for each data type. We computed the FIM for each parameter set and for every potential choice for t3. Figure 8A shows the determinant of the expected covariance of MLE parameters (Σ ≈ FIM−1, defined in log10 parameter space) versus t3 assuming direct observation of smiFISH mRNA (left) or distorted observations of MCP-GFP spots, FISH Intensity, or GFP intensity (right). As expected, distortion always increases expected uncertainty [compare Figure 8A (left) to the other columns]. We also find that the optimal time for the experiment can be highly dependent on the particular assay. In particular, larger distortions require earlier sampling times to prevent mRNA expression from falling below the noise introduced by the distortion.

FIGURE 8
www.frontiersin.org

FIGURE 8. Design of Subsequent Experiment for MS2X128 cassette-tagged HIV-1 reporter gene. (A) Expected volume of uncertainty (det (FIM−1)) versus time of third measurement assuming 100 cells and measurement of: (left to right) smiFISH mRNA, MCP-GFP spots, FISH intensity, or GFP intensity. Solid lines and shading denote mean ± SD for 20 parameter sets selected from MH chains after fitting initial data (magenta, t = (0,300) min) or final data (cyan, t = (0,18,300) min). Cyan and magenta vertical lines denote the optimal design for the third experiment time assuming the corresponding parameter values. (B) Expected volume of MLE uncertainty (det (FIM−1)) for different sets of experiment times and measurement modalities and averaged over 20 parameters sets sampled from the MH chains for initial fit (magenta) or final parameter estimates (cyan). (C) Volume of MLE uncertainty (det (ΣMH) estimated from MH analysis in the same experiment designs as (B). (D) Posterior variance versus FIM prediction of variance for each parameter (symbol key at bottom right), for each measurement modality (different columns) and for analyses based on different sets of data: t =(0,18) min (black), t =(0,300) min (magenta), or t =(0,18,300) min (cyan). All MH analyses contain 20,000 samples. Measurements include 135 cells at t = 0, 96 at t = 18, and 62 at t = 300. Parameter uncertainties defined in log base 10 for all panels.

Although a complete validation of these designs would require a host of experiments that are beyond the scope of the current investigation, we analyze a new data set of 96 cells taken at t3 = 18 min to compare experiment designs with time combinations of (0,18) min, (0,300) min, and (0,18,300) min. Figure 8B shows FIM predictions of uncertainty using parameters from the original fit to the (0,300) min data (magenta) compared to the uncertainty estimate using parameters fit to the final data set with all three time points (cyan). The original FIM using parameters fit to the (0,300) min data (Table 3) and the final FIM using parameters fit to the (0,18,300) min data (Table 4) are in agreement for all combinations of experiment designs (different groups of magenta and cyan bars) and for direct observations of smiFISH mRNA (left) or for any of the distorted data sets. Moreover, upon using the MH to estimate the posterior parameter distributions, Figure 8C shows that both the original and the final FIM correctly predict the trend of parameter uncertainties for each of the different experiment combinations. For example, the FIM correctly predicts the result that a set of (135,62) cells at (0,300) min is more informative than the larger set of (135,96) cells at (0,18) min if one uses the undistorted data (far left), but the opposite is true if one considers the distortion due to measurements of GFP intensity (far right). Finally, Figure 8D shows the variance in each parameter’s estimation for every measurement type as predicted by the FIM analysis (horizontal axis) and verified using 20,000 MH samples of the posterior (vertical axis). The results show a clear correlation between the predicted and measured uncertainty under either the initial (t = 0,300, magenta) or the final (t = 0,18,300, cyan) data sets. The analysis also correctly predicts which parameters are well-identified under which measurements. For example, the FIM analysis correctly predicts that the degradation rate γ (triangles) is well identified using just t = 0,300 min data for the undistorted data, the MCP-GFP spot data, or the FISH intensity data, but requires the t = 18 min data to be identified using the GFP intensity data. In other words, the FIM analysis now explains the failure to identify well-constrained parameters that we observed at the end of the previous section, and correctly suggests that this failure can be substantially ameliorated with an additional measurement of at 62 cells at 18 min.

TABLE 4
www.frontiersin.org

TABLE 4. Parameter priors, MLE estimates, Uncertainties upon Final Fit. Final estimates after fitting to data at t =(0,18,300) min. MH results are from a chain of 20,000 samples. All parameter values and standard deviations values are shown in log10.

4 Discussion

Parameter estimation is a major step in constructing quantitative models for all physical or biological processes, and many such models for gene regulation and cell signaling are now being inferred from quantitative single-cell imaging experiments. Such measurements are subject to errors, where different labels or different image processing can yield different measurement values (e.g., see Figures 5, 6). We have demonstrated that these experimental distortions can be mathematically described using a general class of probability transition kernels that we dub Probability Distortion Operators (PDOs, Figures 2, 5B, E), and that these PDOs lead to changes in the estimated parameters that are reflected not only in the magnitudes but also in the directions of their uncertainties (Figures 4, 7, 8). We have introduced a new computational framework to systematically account for these noise effects and provide a first step toward integrating PDOs into the interpretation of single-cell experiments (Figures 58). Our results indicate that an appropriate statistical analysis coupled with a careful tuning of experimental design variables can meaningfully compensate for measurement noise in the data. For example, our results indicate that, when used iteratively with small sets of experimental data (e.g., less than a couple hundred cells at only two points in time), FIM analysis can correctly predict which subsequent experiments are most likely to be informative, and which are unlikely to provide additional insight into model parameters (Figure 8). Insight provided by such an integration of models and experiments could allow for better allocation of experimental resources first by helping to eliminate estimation biases that are due to experimental noise and second by helping to identify specific experimental conditions that are less prone to be impacted by those measurement artifacts.

Although the FIM is a classical tool for optimal experiment design that has been used extensively in myriad areas of science and engineering (Pronzato and Walter, 1985; Chaudhuri and Mykland, 1993; Emery and Nenarokomov, 1998; Ruess and Lygeros, 2013; Chao et al., 2016; Vahid et al., 2019), it has not seen widespread adoption in the investigation of biological processes, in part because biological processes are heavily subject to heterogeneities that are not accounted for in traditional FIM analyses. However, there has been some progress to extend these tools to the context of gene expression modeling; for example (Komorowski et al., 2011), proposes a method to approximate the FIM for single-cell experiment data based on the Linear Noise Approximation (LNA). Alternatively, the FIM has been approximated by using moment closure techniques (Ruess et al., 2013; Ruess and Lygeros, 2013). These approaches work well in the case of high or moderate molecular copy numbers, but they break down when applied to systems with low molecular copy numbers (Fox and Munsky, 2019), and it is not clear how or if such approximations can be modified to consider measurement noise and data processing noise that are non-additive, asymmetric, or non-Gaussian as is the case for many biological distortions. To circumvent these issues, our alternative framework directly analyses the probability distributions of the noisy measurements. Explicitly modeling the conditional probability of the observation given the true cell state allows us to express the observation distribution as a linear transformation of the true process distribution that is computable using the finite state projection. This leads to a systematic way to develop composite experimental designs that combine measurements at different fidelity and throughput levels to maximize information given a budgetary constraint.

The current investigation provides a few examples and preliminary experimental data to illustrate the broad potential of the FIM and PDO formulations to improve the interpretation and design of single-cell experiments. However, these are limited at present, and there is much that remains to be done to elucidate the full capabilities for these new techniques. For example, not only is additional experimental testing needed to validate the use of FIM-based methods for a broader range of experiment designs and imaging conditions, but the current analyses also need further development i) to allow for more complex definitions of PDOs, ii) to expand the use of FIM and PDO analyses to situations where prior knowledge and models are unavailable or limited, and iii) to extend the FIM analyses for use in important tasks of model reduction or model selection. A few of these limitations and future directions are discussed as follows.

The current investigation uses high- and low-fidelity calibration experiments to parameterize three different PDOs for different measurement modalities, selects the PDO that minimizes the Bayesian Information Criteria in each case, and confirms that this selection also led to the best prediction of held out data. However, our search over possible PDOs was far from exhaustive, and it is almost certain that more accurate PDOs could be found. Adjustments are easily made so that PDOs can capture many different aspects of experimental error or so that they can be applied to many different types of models. For example, through experimental analysis of mRNA counts using different modalities (e.g., smiFISH and MCP-GFP labeling), we demonstrated how one could construct the PDO based on empirical measurements (Figure 8). Similarly, one could examine different microscopes, different cameras, different laser intensities, different image processing pipelines, or any of a number of permutations to compare and quantify differences between high-fidelity and noisy measurements of signaling or gene expression phenomena. In the current work, we have relied on specific parameterized statistical distributions (e.g., binomial or Poisson as examined in Figures 35 or data binning or categorization in Supplementary Data Sheet S1, Section 3.2) or mechanistic distortion models (e.g., integrated fluorescence intensity in Supplementary Data Sheet S1, Section 3.1, stochastic binding kinetics in Supplementary Data Sheet S1, Section 3.3, or segmentation errors in Supplementary Data Sheet S1, Section 3.4) to formulate the PDO. However, one could also envision parameter-free statistical methods based on recent probabilistic machine learning methods [e.g., normalizing flows (Kobyzev et al., 2021)] for modeling the conditional distribution p (y|x) of a noisy label y given a vector x of features. In either case, one would need only to calibrate the PDO once for each combination of labeling, microscopy, and imaging techniques and then one could to apply that PDO to many different biological processes, models, parameter sets or experiment designs. For simplicity, we have assumed that calibration data is available and that the PDO is constant in time. However, the formulations of the PDO and FIM (Eqs 37) are sufficiently general such that these requirements can be relaxed. With these relaxations, the FIM could be used either to guide experiment designs to aid the simultaneous identification of both a parameterized PDO and the gene regulation model itself, or provide clear guidance that such distortions lead to degeneracy in the FIM and indicate which non-identifiable parameter combinations (i.e., the null space of the distorted FIM) are most in need of model reduction.

An important limitation of any model-based experiment design approach is that to make predictions, one must have some prior knowledge about the system under investigation. In the experimental example above, we used insight gathered in a previous biological context [i.e., live-cell analyses of nascent transcription sites from Forero-Quintero et al. (2021)] to guess some model parameters (i.e., the transcription burst sizes and frequencies), and we used general knowledge of the cell line to guess other parameters (e.g., the half-life for mammalian mRNA). For many single-cell optical microscopy investigations, such information is available in advance, due to the fact that one must choose which genes or pathways to investigate before designing smFISH probes or modifying cells and promoters to express the MS2-MCP reporters, and this choice is typically based on experience or previous investigations in the literature (e.g., on analyses of related genes or pathways, in different environmental conditions, or with other exploratory experimental techniques). However, for earlier stage exploratory investigations, where such prior knowledge may not be available, one may need to collect some preliminary data before building models (e.g., collect data for a small handful of time points). In this case, computing the FIM after fitting to the first round of experiments can help to elucidate which parameter sets of the models are well-identified (i.e., vectors in parameter space corresponding to large eigenvalues of the FIM after the initial experiment), which can be improved with different experiment designs (i.e., vectors in parameter space that correspond to larger eigenvalues of the FIM for different experiments), and which cannot be identified for any experiment (i.e., vectors in parameter space that lie in the null space of the FIM no matter what experiment is considered).

Although we have only considered one model in the main text for the current investigation, the insight provided by the FIM could also be utilized to analyze multiple candidate model structures, an important task that has previously been explored under the assumption that smFISH yields exact measurements of mRNA content Neuert et al. (2013), Kalb et al.( 2021), Kilic et al. (2023). As discussed above, the FIM is useful to identify and prune highly-uncertain parameter combinations or remove unidentifiable model mechanisms. For example, using the distorted FISH intensity measurements at t = (0,300) min, the FIM analysis clearly shows that the burst frequency (log10ω) and burst size (log10β) are jointly uncertain along the negative diagonal (Figure 7B, rightmost column), meaning that the average total production rate and standard deviation (ωβ ≈ 1.04 ± 0.16 min−1) could be well constrained, while the individual parameters β = 5.0 ± 5.8 and ω = 1.1 ± 2.8 min−1 could not be independently identified. A similar observation holds using the MCP-GFP spot count data. In both cases, without the constraint of the prior, this uncertainty would extend to the limit where β is much less than one, at which point the proposed 3-state model with parameters [kON, kOFF, ω, β, γ] reduces to an equivalent 2-state model with parameters [kON, kOFF, krωβ, γ]. Indeed, Supplementary Data Sheet S1, Supplementary Figure S11 shows that this simpler model (Supplementary Data Sheet S1, Supplementary Figure S1A) provides nearly (but not quite) as good fits to the all data types when estimated from the t=(0,300) min data (Supplementary Data Sheet S1, Supplementary Figure S1B, left columns, compare fit likelihood and BIC values to Figure 7B), but these fits have much less parameter uncertainty (Supplementary Data Sheet S1, Supplementary Figure S1B right columns). Moreover, for the smiFISH mRNA counts, the MCP-GFP spot counts and the MCP-GFP intensity data, the simpler model led to better predictions for the held out data at t = 18 min (Supplementary Data Sheet S1, Supplementary Figure S1B, middle columns compare prediction likelihood values to Figure 7B). Moving forward, if one’s goal were to differentiate further between these or other competing hypotheses for model mechanisms, one could use FIM-based experiment design to suggest conditions that promise strong uncertainty reduction for several competing models at once. For example, Supplementary Data Sheet S1, Supplementary Figure S12 shows the variance reduction predicted for various possible experiment designs for the reduced model. Comparing FIM analyses of potential experiment designs to the results to the original model in Figure 8, we find that for all distortions, experiments that reduce uncertainty for one model should also be effective for the other. However, the purpose of the current study is only to introduce the FIM + PDO formulation, and a complete analysis of the use of FIM insight for model selection is left for future investigations.

Whether one starts with initial parameter and model structures guesses from previous experimentation or based on a preliminary round of experiments, subsequent model identification is most effective when pursued as an iterative endeavor, requiring evaluation of uncertainty and model-driven experiment design at each stage. For example, in our analysis of the HIV-1 reporter gene, initial data at 0 and 300 min after Tpt revealed that the practical identifiability of parameters depended heavily on which measurement was used. Assuming ideal measurements (i.e., using smiFISH), Figure 8B shows that the model with (n = 4) free parameters could be identified to an log10-uncertainty volume of det (FIM−1) = 1.88 × 10−11 if we include the prior or det (FIM−1) = 1.91 × 10−11 if we do not include the prior. However, using total FISH intensity, the model was much less certain at det (FIM−1) = 6.39 × 10−9 with the prior or det (FIM−1) = 1.43 × 10−6 without the prior. Since the determinant of the n-parameter FIM scales with the number of cells according to: det (αFIM) = αn det (FIM), it would take a factor of α=(1.43×106/1.91×1011)1/4=16.6 times as many cells using FISH-intensity to achieve the same accuracy as with the smiFISH mRNA data.1 Moreover, we also see that the optimal subsequent experiment design also depends heavily on which measurement modality is used. For smiFISH experiments that are assumed to be free from observation noise, the optimal next time point is very late (t = 696 min), whereas for the distorted observations, measurements should be taken much earlier (e.g., at t = 138 min for MCP-GFP spots, at t = 0 min for FISH intensity, and at t = 42 min for GFP intensity). Furthermore, in the worst case, choosing the next experiment based on an incorrect assumption for the PDO could lead to waste of experimental efforts—e.g., using the long time as suggested by the smiFISH analysis would be almost entirely worthless if used with one of the other measurements assays. The current study, which is meant only to introduce the FIM and its use for experiment design, has limited availability of experimental data (i.e., one replica with three time points) and only for an artificial HIV-1 reporter construct. A full examination for the use of FIM in iterative single-cell experiment design for endogenous gene regulatory pathways is ongoing and will be described in future publications.

For the model under consideration to fit the HIV-1 promoter with the 128XMS2 stem-loop cassette, computing the CME solution took an average of 1.3 s in Matlab on a 2019 MacBook Pro (2.6 GHz 6-Core Intel Core i7), computing the FIM took an average of 40s (39.94 s to solve the sensitivity to all parameters and 0.06 s to compute the FIM), and running the MH for 20,000 samples took 26,000 s, thus the FIM estimates uncertainty roughly 650 times faster than the MH. For use in experiment design, the cost savings provided by the FIM can be much higher. Because one can reuse pre-computed sensitivities, the computational cost is only 0.06 s for each new experiment design (e.g., to explore different numbers of cells, different PDOs, or different time point selections). In contrast, using a traditional approach of generating and then fitting simulated data (e.g., to generate MLE scatter plots as shown Figures 3C, D or Supplementary Data Sheet S1, Supplementary Figure S5), if one liberally assumed that an experiment could be evaluated using only Ns = 100 samples and that adequate fits could be achieved using only Nf = 100 parameter guesses (usually far more function evaluations are needed), then evaluating each new experiment design would require one CME solution (1.3 s) to generate data and 1.3 ⋅ NsNf = 13,000 s to fit those data (217,000 times longer than the FIM approach). Given a priori uncertainty in the model, in practice one would need to redo both the FIM analysis (40 s + 0.6 s per experiment) or the simulation-based analysis (13,000 s per experiment) for many different parameter sets or model structures (e.g., we show results from 20 parameter combinations in Figures 8A, Supplementary Data Sheet S1, Supplementary Figure S12A), and the savings provided by the FIM becomes even more important.

Finally, although the presented approach is versatile and can in principle be applied to any stochastic gene regulatory network, its practical use depends on the ability to compute a reasonable approximation to the solution of the chemical master equation (CME) as well as its partial derivatives with respect to model parameters. Fortunately, there are now many relevant stochastic gene expression models for which exact or approximate analytical expressions for the CME solution are available (Peccoud and Ycart, 1995; Singh and Bokes, 2012; Thomas and Grima, 2015; Cao and Grima, 2020; Ham et al., 2020; Vastola, 2021). Furthermore, the FSP and similar approaches have been used successfully to solve the CME for many non-linear and time-inhomogeneous regulatory models for which closed-form solutions do not exist (Skupsky et al., 2010; Lou et al., 2012; Earnest et al., 2013; Neuert et al., 2013; Sepúlveda et al., 2016; Skinner et al., 2016). For example, Supplementary Data Sheet S1, Section S4 analyzes a model of the nonlinear genetic toggle switch, and the FIM and PDO analysis is used to ask which species should be measured and for how many cells in order to best identify model parameters. As another example, Supplementary Data Sheet S1, Section S5 analyzes a spatial stochastic model with a four-state gene expression model under time varying MAPK activation signal and nucleus to cytoplasmic transport Munsky et al. (2018). Admittedly, given the complexity of gene regulatory networks in single cells, there will always be stochastic gene regulatory models whose direct CME solutions are beyond reach. Nevertheless, continued advancements in computational algorithms (Kazeev et al., 2014; Cao et al., 2016; Gómez-Schiavon et al., 2017; Lin and Buchler, 2019; Catanach et al., 2020; Gupta et al., 2021; Öcal et al., 2022) are enlarging the set of tractable CME models, which in turns can help accelerate the cycling between data acquisition, model identification, and optimal experiment design in single-cell studies.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: Python code for the simulation and FIM analysis of all models is available at: https://doi.org/10.5281/zenodo.7896003 Raw microscopy images are posted at: https://doi.org/10.5281/zenodo.7868322 Python code for processing single-cell images (Aguilera et al. (2023)) is available at: https://doi.org/10.5281/zenodo.7864264 Matlab code for analyzing HIV1 Tryptolide response data using the three-state and reduced models is available at: https://doi.org/10.5281/zenodo.7893296.

Author contributions

Conceptualization: BM. Theory and computational modeling: HV and BM. Performed experiments/collected data: LQ. Image processing: LU. Writing: HV, LQ, LA, and BM. Resources, supervision, and funding acquisition: BM. All authors contributed to the article and approved the submitted version.

Funding

HV, LA, and BM were supported by National Institutes of Health (R35GM124747). LQ and BM were also supported by the NSF (1941870).

Acknowledgments

We thank members of Munsky group for valuable comments on the manuscript, especially Eric Ron for suggesting the investigation of cell segmentation noise.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcell.2023.1133994/full#supplementary-material

Footnotes

1We note that total intensity is much easier to compute than spot detection and could in principle be measured at lower microscope resolution or even with flow cytometry. Depending upon available equipment, collecting 16.6 times as many cells could potentially be achieved at a lower overall experimental cost!

References

Aguilera, L. U., Cook, J., Ron, E., Forero-Quintero, L. S., and Munsky, B. (2023). Fluorescence in Situ Hybridization (FISH) - automated image processing. doi:10.5281/zenodo.7864264

CrossRef Full Text | Google Scholar

Anderson, D. F., and Kurtz, T. G. (2011). “Continuous time Markov chain models for chemical reaction networks,” in Design and analysis of biomolecular circuits: Engineering approaches to systems and synthetic biology. Editors H. Koeppl, G. Setti, M. di Bernardo, and D. Densmore (New York, NY: Springer), 3–42. doi:10.1007/978-1-4419-6766-4_1

CrossRef Full Text | Google Scholar

Atkinson, A. C., and Donev, A. N. (1992). “Optimum experimental designs,” in Oxford statistical science series (Oxford, New York: Oxford University Press).

Google Scholar

Bader, B. W., and Kolda, T. G. (2008). Efficient matlab computations with sparse and factored tensors. SIAM J. Sci. Comput. 30, 205–231. doi:10.1137/060676489

CrossRef Full Text | Google Scholar

Balázsi, G., van Oudenaarden, A., and Collins, J. J. (2011). Cellular decision making and biological noise: From microbes to mammals. Cell. 144, 910–925. doi:10.1016/j.cell.2011.01.030

PubMed Abstract | CrossRef Full Text | Google Scholar

Batish, M., Raj, A., and Tyagi, S. (2011). “Single molecule imaging of RNA in situ,” in RNA detection and visualization: Methods and protocols. Methods in molecular biology. Editor J. E. Gerst (Totowa, NJ: Humana Press), 3–13. doi:10.1007/978-1-61779-005-8_1

PubMed Abstract | CrossRef Full Text | Google Scholar

Benzinger, D., and Khammash, M. (2018). Pulsatile inputs achieve tunable attenuation of gene expression variability and graded multi-gene regulation. Nat. Commun. 9, 3521. doi:10.1038/s41467-018-05882-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Breda, J., Zavolan, M., and van Nimwegen, E. (2021). Bayesian inference of gene expression states from single-cell RNA-seq data. Nat. Biotechnol. 39, 1008–1016. doi:10.1038/s41587-021-00875-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Brock, A., Chang, H., and Huang, S. (2009). Non-genetic heterogeneity — A mutation-independent driving force for the somatic evolution of tumours. Nat. Rev. Genet. 10, 336–342. doi:10.1038/nrg2556

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, Y., Lei, X., Ribeiro, R. M., Perelson, A. S., and Liang, J. (2018). Probabilistic control of HIV latency and transactivation by the Tat gene circuit. Proc. Natl. Acad. Sci. 115, 12453–12458. doi:10.1073/pnas.1811195115

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, Y., Terebus, A., and Liang, J. (2016). State space truncation with quantified errors for accurate solutions to discrete chemical master equation. Bull. Math. Biol. 78, 617–661. doi:10.1007/s11538-016-0149-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, Z., and Grima, R. (2020). Analytical distributions for detailed models of stochastic gene expression in eukaryotic cells. Proc. Natl. Acad. Sci. 117, 4682–4692. doi:10.1073/pnas.1910888117

PubMed Abstract | CrossRef Full Text | Google Scholar

Catanach, T. A., Vo, H. D., and Munsky, B. (2020). Bayesian inference of stochastic reaction networks using multifidelity sequential tempered Markov chain Monte Carlo. Int. J. Uncertain. Quantification 10, 515–542. doi:10.1615/Int.J.UncertaintyQuantification.2020033241

CrossRef Full Text | Google Scholar

Chao, J., Sally Ward, E., and Ober, R. J. (2016). Fisher information theory for parameter estimation in single molecule microscopy: Tutorial. J. Opt. Soc. Am. A 33, B36–B57. doi:10.1364/JOSAA.33.000B36

PubMed Abstract | CrossRef Full Text | Google Scholar

Chaudhuri, P., and Mykland, P. A. (1993). Nonlinear experiments: Optimal design and inference based on likelihood. J. Am. Stat. Assoc. 88, 538–546. doi:10.1080/01621459.1993.10476305

CrossRef Full Text | Google Scholar

Chib, S., and Greenberg, E. (1995). Understanding the metropolis-hastings algorithm. Am. statistician 49, 327–335. doi:10.1080/00031305.1995.10476177

CrossRef Full Text | Google Scholar

Earnest, T. M., Roberts, E., Assaf, M., Dahmen, K., and Luthey-Schulten, Z. (2013). DNA looping increases the range of bistability in a stochastic model of thelacgenetic switch. Phys. Biol. 10, 026002. doi:10.1088/1478-3975/10/2/026002

PubMed Abstract | CrossRef Full Text | Google Scholar

El Meouche, I., Siu, Y., and Dunlop, M. J. (2016). Stochastic expression of a multiple antibiotic resistance activator confers transient resistance in single cells. Sci. Rep. 6, 19538. doi:10.1038/srep19538

PubMed Abstract | CrossRef Full Text | Google Scholar

El-Samad, H., and Weissman, J. S. (2011). Genetics: Noise rules. Nature 480, 188–189. doi:10.1038/480188a

PubMed Abstract | CrossRef Full Text | Google Scholar

Emery, A. F., and Nenarokomov, A. V. (1998). Optimal experiment design. Meas. Sci. Technol. 9, 864–876. doi:10.1088/0957-0233/9/6/003

CrossRef Full Text | Google Scholar

Faller, D., Klingmüller, U., and Timmer, J. (2003). Simulation methods for optimal experimental design in systems biology. SIMULATION 79, 717–725. doi:10.1177/0037549703040937

CrossRef Full Text | Google Scholar

Femino, A. M., Fay, F. S., Fogarty, K., and Singer, R. H. (1998). Visualization of single RNA transcripts in situ. Science 280, 585–590. doi:10.1126/science.280.5363.585

PubMed Abstract | CrossRef Full Text | Google Scholar

Forero-Quintero, L. S., Raymond, W., Handa, T., Saxton, M. N., Morisaki, T., Kimura, H., et al. (2021). Live-cell imaging reveals the spatiotemporal organization of endogenous RNA polymerase II phosphorylation at a single gene. Nat. Commun. 12, 3158. doi:10.1038/s41467-021-23417-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Fox, Z. R., Fletcher, S., Fraisse, A., Aditya, C., Sosa-Carrillo, S., Petit, J., et al. (2022). Enabling reactive microscopy with MicroMator. Nat. Commun. 13, 2199. doi:10.1038/s41467-022-29888-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Fox, Z. R., and Munsky, B. (2019). The finite state projection based Fisher information matrix approach to estimate information and optimize single-cell experiments. PLOS Comput. Biol. 15, e1006365. doi:10.1371/journal.pcbi.1006365

PubMed Abstract | CrossRef Full Text | Google Scholar

Fukaya, T., Lim, B., and Levine, M. (2016). Enhancer control of transcriptional bursting. Cell. 166, 358–368. doi:10.1016/j.cell.2016.05.025

PubMed Abstract | CrossRef Full Text | Google Scholar

Gadkar, K. G., Gunawan, R., and Doyle, F. J. (2005). Iterative approach to model identification of biological networks. BMC Bioinforma. 6, 155. doi:10.1186/1471-2105-6-155

PubMed Abstract | CrossRef Full Text | Google Scholar

Garcia-Bernardo, J., and Dunlop, M. J. (2015). Noise and low-level dynamics can coordinate multicomponent bet hedging mechanisms. Biophysical J. 108, 184–193. doi:10.1016/j.bpj.2014.11.048

PubMed Abstract | CrossRef Full Text | Google Scholar

Garcia-Bernardo, J., and Dunlop, M. J. (2013). Tunable stochastic pulsing in the Escherichia coli multiple antibiotic resistance network from interlinked positive and negative feedback loops. PLOS Comput. Biol. 9, e1003229. doi:10.1371/journal.pcbi.1003229

PubMed Abstract | CrossRef Full Text | Google Scholar

Gillespie, D. T. (2007). Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58, 35–55. doi:10.1146/annurev.physchem.58.032806.104637

PubMed Abstract | CrossRef Full Text | Google Scholar

Gómez-Schiavon, M., Chen, L.-F., West, A. E., and Buchler, N. E. (2017). BayFish: Bayesian inference of transcription dynamics from population snapshots of single-molecule RNA FISH in single cells. Genome Biol. 18, 164. doi:10.1186/s13059-017-1297-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Gupta, A., Schwab, C., and Khammash, M. (2021). DeepCME: A deep learning framework for computing solution statistics of the chemical master equation. PLOS Comput. Biol. 17, e1009623. doi:10.1371/journal.pcbi.1009623

PubMed Abstract | CrossRef Full Text | Google Scholar

Gutenkunst, R. N., Waterfall, J. J., Casey, F. P., Brown, K. S., Myers, C. R., and Sethna, J. P. (2007). Universally sloppy parameter sensitivities in systems biology models. PLoS Comput. Biol. 3, 1871–1878. doi:10.1371/journal.pcbi.0030189

PubMed Abstract | CrossRef Full Text | Google Scholar

Haimovich, G., and Gerst, J. E. (2018). Single-molecule fluorescence in situ hybridization (smfish) for rna detection in adherent animal cells. Bio-protocol 8, e3070

PubMed Abstract | CrossRef Full Text | Google Scholar

Ham, L., Schnoerr, D., Brackston, R. D., and Stumpf, M. P. H. (2020). Exactly solvable models of stochastic gene expression. J. Chem. Phys. 152, 144106. doi:10.1063/1.5143540

PubMed Abstract | CrossRef Full Text | Google Scholar

Imbert, A., Ouyang, W., Safieddine, A., Coleno, E., Zimmer, C., Bertrand, E., et al. (2022). Fish-quant v2: A scalable and modular tool for smfish image analysis. RNA , rna– 28, 786–795. doi:10.1261/rna.079073.121

CrossRef Full Text | Google Scholar

Inc, T. M. (2022). Matlab version. 9.13.0 (r2022b).

Google Scholar

Isaacs, F. J., Hasty, J., Cantor, C. R., and Collins, J. J. (2003). Prediction and measurement of an autoregulatory genetic module. Proc. Natl. Acad. Sci. 100, 7714–7719. doi:10.1073/pnas.1332628100

PubMed Abstract | CrossRef Full Text | Google Scholar

Kalb, D., Vo, H. D., Adikari, S., Hong-Geller, E., Munsky, B., and Werner, J. (2021). Visualization and modeling of inhibition of il-1β and tnf-α mrna transcription at the single-cell level. Sci. Rep. 11, 13692. doi:10.1038/s41598-021-92846-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Kazeev, V., Khammash, M., Nip, M., and Schwab, C. (2014). Direct solution of the chemical master equation using quantized tensor trains. PLOS Comput. Biol. 10, e1003359. doi:10.1371/journal.pcbi.1003359

PubMed Abstract | CrossRef Full Text | Google Scholar

Kesler, B., Li, G., Thiemicke, A., Venkat, R., and Neuert, G. (2019). Automated cell boundary and 3D nuclear segmentation of cells in suspension. Sci. Rep. 9, 10237. doi:10.1038/s41598-019-46689-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Kilic, Z., Schweiger, M., Moyer, C., Shepherd, D., and Pressé, S. (2023). Gene expression model inference from snapshot rna data using bayesian non-parametrics. Nat. Comput. Sci. 3, 174–183. doi:10.1038/s43588-022-00392-0

CrossRef Full Text | Google Scholar

Kobyzev, I., Prince, S. J., and Brubaker, M. A. (2021). Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern Analysis Mach. Intell. 43, 3964–3979. doi:10.1109/TPAMI.2020.2992934

CrossRef Full Text | Google Scholar

Komorowski, M., Costa, M. J., Rand, D. A., and Stumpf, M. P. H. (2011). Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proc. Natl. Acad. Sci. 108, 8645–8650. doi:10.1073/pnas.1015814108

PubMed Abstract | CrossRef Full Text | Google Scholar

Kreutz, C., and Timmer, J. (2009). Systems biology: Experimental design. FEBS J. 276, 923–942. doi:10.1111/j.1742-4658.2008.06843.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Lagarias, J. C., Reeds, J. A., Wright, M. H., and Wright, P. E. (1998). Convergence properties of the Nelder–Mead simplex method in low dimensions. SIAM J. Optim. 9, 112–147. doi:10.1137/s1052623496303470

CrossRef Full Text | Google Scholar

Larsson, A. J. M., Johnsson, P., Hagemann-Jensen, M., Hartmanis, L., Faridani, O. R., Reinius, B., et al. (2019). Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254. doi:10.1038/s41586-018-0836-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Lewis, K. (2010). Persister cells. Annu. Rev. Microbiol. 64, 357–372. doi:10.1146/annurev.micro.112408.134306

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, G., and Neuert, G. (2019). Multiplex RNA single molecule FISH of inducible mRNAs in single yeast cells. Sci. Data 6, 94. doi:10.1038/s41597-019-0106-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Lim, H. N., and van Oudenaarden, A. (2007). A multistep epigenetic switch enables the stable inheritance of DNA methylation states. Nat. Genet. 39, 269–275. doi:10.1038/ng1956

PubMed Abstract | CrossRef Full Text | Google Scholar

Lin, Y. T., and Buchler, N. E. (2019). Exact and efficient hybrid Monte Carlo algorithm for accelerated Bayesian inference of gene expression models from snapshots of single-cell transcripts. J. Chem. Phys. 151, 024106. doi:10.1063/1.5110503

PubMed Abstract | CrossRef Full Text | Google Scholar

Lipinski-Kruszka, J., Stewart-Ornstein, J., Chevalier, M. W., and El-Samad, H. (2015). Using dynamic noise propagation to infer causal regulatory relationships in biochemical networks. ACS Synth. Biol. 4, 258–264. doi:10.1021/sb5000059

PubMed Abstract | CrossRef Full Text | Google Scholar

Lou, C., Stanton, B., Chen, Y.-J., Munsky, B., and Voigt, C. A. (2012). Ribozyme-based insulator parts buffer synthetic circuits from genetic context. Nat. Biotechnol. 30, 1137–1142. doi:10.1038/nbt.2401

PubMed Abstract | CrossRef Full Text | Google Scholar

Marusyk, A., Almendro, V., and Polyak, K. (2012). Intra-tumour heterogeneity: A looking glass for cancer? Nat. Rev. Cancer 12, 323–334. doi:10.1038/nrc3261

PubMed Abstract | CrossRef Full Text | Google Scholar

McAdams, H. H., and Arkin, A. (1997). Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci. 94, 814–819. doi:10.1073/pnas.94.3.814

PubMed Abstract | CrossRef Full Text | Google Scholar

McQuarrie, D. A. (1967). Stochastic approach to chemical kinetics. J. Appl. Probab. 4, 413–478. doi:10.1017/s002190020002547x

CrossRef Full Text | Google Scholar

Milias-Argeitis, A., Rullan, M., Aoki, S. K., Buchmann, P., and Khammash, M. (2016). Automated optogenetic feedback control for precise and robust regulation of gene expression and cell growth. Nat. Commun. 7, 12546. doi:10.1038/ncomms12546

PubMed Abstract | CrossRef Full Text | Google Scholar

Milias-Argeitis, A., Summers, S., Stewart-Ornstein, J., Zuleta, I., Pincus, D., El-Samad, H., et al. (2011). In silico feedback for in vivo regulation of a gene expression circuit. Nat. Biotechnol. 29, 1114–1116. doi:10.1038/nbt.2018

PubMed Abstract | CrossRef Full Text | Google Scholar

Munsky, B., Li, G., Fox, Z. R., Shepherd, D. P., and Neuert, G. (2018). Distribution shapes govern the discovery of predictive models for gene regulation. Proc. Natl. Acad. Sci. 115, 7533–7538. doi:10.1073/pnas.1804060115

PubMed Abstract | CrossRef Full Text | Google Scholar

Munsky, B., Trinh, B., and Khammash, M. (2009). Listening to the noise: Random fluctuations reveal gene network parameters. Mol. Syst. Biol. 5, 318. doi:10.1038/msb.2009.75

PubMed Abstract | CrossRef Full Text | Google Scholar

Neuert, G., Munsky, B., Tan, R. Z., Teytelman, L., Khammash, M., and Oudenaarden, A. v. (2013). Systematic identification of signal-activated stochastic gene regulation. Science 339, 584–587. doi:10.1126/science.1231456

PubMed Abstract | CrossRef Full Text | Google Scholar

Öcal, K., Gutmann, M. U., Sanguinetti, G., and Grima, R. (2022). Inference and uncertainty quantification of stochastic gene expression via synthetic models. bioRxiv. doi:10.1101/2022.01.25.477666

CrossRef Full Text | Google Scholar

Peccoud, J., and Ycart, B. (1995). Markovian modeling of gene-product synthesis. Theor. Popul. Biol. 48, 222–234. doi:10.1006/tpbi.1995.1027

CrossRef Full Text | Google Scholar

Pitchiaya, S., Heinicke, L. A., Custer, T. C., and Walter, N. G. (2014). Single molecule fluorescence approaches shed light on intracellular RNAs. Chem. Rev. 114, 3224–3265. doi:10.1021/cr400496q

PubMed Abstract | CrossRef Full Text | Google Scholar

Pronzato, L., and Walter, E. (1985). Robust experiment design via stochastic approximation. Math. Biosci. 75, 103–120. doi:10.1016/0025-5564(85)90068-9

CrossRef Full Text | Google Scholar

Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y., and Tyagi, S. (2006). Stochastic mRNA synthesis in mammalian cells. PLOS Biol. 4, e309. doi:10.1371/journal.pbio.0040309

PubMed Abstract | CrossRef Full Text | Google Scholar

Raj, A., and Tyagi, S. (2010). “Chapter 17 - detection of individual endogenous RNA transcripts in situ using multiple singly labeled probes,” in Methods in enzymology. Editor N. G. Walter (Academic Press), 365–386. 472 of Single Molecule Tools: Fluorescence Based Approaches, Part A. doi:10.1016/S0076-6879(10)72004-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Raj, A., and van Oudenaarden, A. (2008). Nature, nurture, or chance: Stochastic gene expression and its consequences. Cell. 135, 216–226. doi:10.1016/j.cell.2008.09.050

PubMed Abstract | CrossRef Full Text | Google Scholar

Raj, A., and van Oudenaarden, A. (2009). Single-molecule approaches to stochastic gene expression. Annu. Rev. Biophysics 38, 255–270. doi:10.1146/annurev.biophys.37.032807.125928

PubMed Abstract | CrossRef Full Text | Google Scholar

Rouzine, I. M., Razooky, B. S., and Weinberger, L. S. (2014). Stochastic variability in HIV affects viral eradication. Proc. Natl. Acad. Sci. 111, 13251–13252. doi:10.1073/pnas.1413362111

PubMed Abstract | CrossRef Full Text | Google Scholar

Ruess, J., and Lygeros, J. (2013). Identifying stochastic biochemical networks from single-cell population experiments: A comparison of approaches based on the Fisher information. In 52nd IEEE conference on decision and control. 2703–2708. doi:10.1109/CDC.2013.6760291

CrossRef Full Text | Google Scholar

Ruess, J., Milias-Argeitis, A., and Lygeros, J. (2013). Designing experiments to understand the variability in biochemical reaction networks. J. R. Soc. Interface 10, 20130588. doi:10.1098/rsif.2013.0588

PubMed Abstract | CrossRef Full Text | Google Scholar

Ruess, J., Parise, F., Milias-Argeitis, A., Khammash, M., and Lygeros, J. (2015). Iterative experiment design guides the characterization of a light-inducible gene expression circuit. Proc. Natl. Acad. Sci. 112, 8148–8153. doi:10.1073/pnas.1423947112

PubMed Abstract | CrossRef Full Text | Google Scholar

Safieddine, A., Coleno, E., Lionneton, F., Traboulsi, A.-M., Salloum, S., Lecellier, C.-H., et al. (2022). Ht-smfish: A cost-effective and flexible workflow for high-throughput single-molecule rna imaging. Nat. Protoc. 18, 157–187. doi:10.1038/s41596-022-00750-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Sanchez, A., and Golding, I. (2013). Genetic determinants and cellular constraints in noisy gene expression. Science 342, 1188–1193. doi:10.1126/science.1242975

PubMed Abstract | CrossRef Full Text | Google Scholar

Senecal, A., Munsky, B., Proux, F., Ly, N., Braye, F. E., Zimmer, C., et al. (2014). Transcription factors modulate c-fos transcriptional bursts. Cell. Rep. 8, 75–83. doi:10.1016/j.celrep.2014.05.053

PubMed Abstract | CrossRef Full Text | Google Scholar

Sepúlveda, L. A., Xu, H., Zhang, J., Wang, M., and Golding, I. (2016). Measurement of gene regulation in individual cells reveals rapid switching between promoter states. Science 351, 1218–1222. doi:10.1126/science.aad0635

PubMed Abstract | CrossRef Full Text | Google Scholar

Shepherd, D. P., Li, N., Micheva-Viteva, S. N., Munsky, B., Hong-Geller, E., and Werner, J. H. (2013). Counting small RNA in pathogenic bacteria. Anal. Chem. 85, 4938–4943. doi:10.1021/ac303792p

PubMed Abstract | CrossRef Full Text | Google Scholar

Singh, A., and Bokes, P. (2012). Consequences of mRNA transport on stochastic variability in protein levels. Biophysical J. 103, 1087–1096. doi:10.1016/j.bpj.2012.07.015

PubMed Abstract | CrossRef Full Text | Google Scholar

Singh, A., Razooky, B., Cox, C. D., Simpson, M. L., and Weinberger, L. S. (2010). Transcriptional bursting from the HIV-1 promoter is a significant source of stochastic noise in HIV-1 gene expression. Biophysical J. 98, L32–L34. doi:10.1016/j.bpj.2010.03.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Skinner, S. O., Xu, H., Nagarkar-Jaiswal, S., Freire, P. R., Zwaka, T. P., and Golding, I. (2016). Single-cell analysis of transcription kinetics across the cell cycle. eLife 5, e12175. doi:10.7554/eLife.12175

PubMed Abstract | CrossRef Full Text | Google Scholar

Skupsky, R., Burnett, J. C., Foley, J. E., Schaffer, D. V., and Arkin, A. P. (2010). HIV promoter integration site primarily modulates transcriptional burst size rather than frequency. PLOS Comput. Biol. 6, e1000952. doi:10.1371/journal.pcbi.1000952

PubMed Abstract | CrossRef Full Text | Google Scholar

Stewart-Ornstein, J., and El-Samad, H. (2012). “Chapter 5 - stochastic modeling of cellular networks,” in Methods in cell biology. Computational methods in cell biology. Editors A. R. Asthagiri, and A. P. Arkin (Academic Press), 110, 111–137. doi:10.1016/B978-0-12-388403-9.00005-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Stringer, C., Wang, T., Michaelos, M., and Pachitariu, M. (2021). Cellpose: A generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106. doi:10.1038/s41592-020-01018-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Suter, D. M., Molina, N., Gatfield, D., Schneider, K., Schibler, U., and Naef, F. (2011). Mammalian genes are transcribed with widely different bursting kinetics. Science 332, 472–474. doi:10.1126/science.1198817

PubMed Abstract | CrossRef Full Text | Google Scholar

Tantale, K., Mueller, F., Kozulic-Pirher, A., Lesne, A., Victor, J.-M., Robert, M.-C., et al. (2016). A single-molecule view of transcription reveals convoys of rna polymerases and multi-scale bursting. Nat. Commun. 7, 12248. doi:10.1038/ncomms12248

PubMed Abstract | CrossRef Full Text | Google Scholar

Thomas, P., and Grima, R. (2015). Approximate probability distributions of the master equation. Phys. Rev. E 92, 012120. doi:10.1103/PhysRevE.92.012120

PubMed Abstract | CrossRef Full Text | Google Scholar

Tiberi, S., Walsh, M., Cavallaro, M., Hebenstreit, D., and Finkenstädt, B. (2018). Bayesian inference on stochastic gene transcription from flow cytometry data. Bioinformatics 34, i647–i655. doi:10.1093/bioinformatics/bty568

PubMed Abstract | CrossRef Full Text | Google Scholar

Transtrum, M. K., MacHta, B. B., and Sethna, J. P. (2010). Why are nonlinear fits to data so challenging? Phys. Rev. Lett. 104, 060201. doi:10.1103/PhysRevLett.104.060201

PubMed Abstract | CrossRef Full Text | Google Scholar

Tsanov, N., Samacoits, A., Chouaib, R., Traboulsi, A.-M., Gostan, T., Weber, C., et al. (2016). Smifish and fish-quant–a flexible single rna detection approach with super-resolution capability. Nucleic acids Res. 44, e165. doi:10.1093/nar/gkw784

PubMed Abstract | CrossRef Full Text | Google Scholar

Vahid, M. R., Hanzon, B., and Ober, R. J. (2019). Effect of pixelation on the parameter estimation of single molecule trajectories. arXiv:1912.07182 [eess] ArXiv: 1912.07182.

Google Scholar

Vastola, J. J. (2021). Solving the chemical master equation for monomolecular reaction systems and beyond: A doi-peliti path integral view. J. Math. Biol. 83, 48. doi:10.1007/s00285-021-01670-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Weinberger, A. D., and Weinberger, L. S. (2013). Stochastic fate selection in HIV-infected patients. Cell. 155, 497–499. doi:10.1016/j.cell.2013.09.039

PubMed Abstract | CrossRef Full Text | Google Scholar

Wheat, J. C., Sella, Y., Willcockson, M., Skoultchi, A. I., Bergman, A., Singer, R. H., et al. (2020). Single-molecule imaging of transcription dynamics in somatic stem cells. Nature 583, 431–436. doi:10.1038/s41586-020-2432-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Zuleta, I. A., Aranda-Díaz, A., Li, H., and El-Samad, H. (2014). Dynamic characterization of growth and gene expression using high-throughput automated flow cytometry. Nat. Methods 11, 443–448. doi:10.1038/nmeth.2879

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: single-cell, fluorescence in situ hybridization (FISH), stochastic gene expression, experiment design, Fisher Information matrix, measurement noise, model inference

Citation: Vo HD, Forero-Quintero LS, Aguilera LU and Munsky B (2023) Analysis and design of single-cell experiments to harvest fluctuation information while rejecting measurement noise. Front. Cell Dev. Biol. 11:1133994. doi: 10.3389/fcell.2023.1133994

Received: 29 December 2022; Accepted: 10 May 2023;
Published: 26 May 2023.

Edited by:

Khuloud Jaqaman, University of Texas Southwestern Medical Center, United States

Reviewed by:

Mark Kittisopikul, Janelia Research Campus, United States
Xinxin Wang, University of Texas Southwestern Medical Center, United States

Copyright © 2023 Vo, Forero-Quintero, Aguilera and Munsky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Brian Munsky, bXVuc2t5QGNvbG9zdGF0ZS5lZHU=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.