Oscillatory Biomedical Signals: Frontiers in Mathematical Models and Statistical Analysis

Wu, Hau-Tieng; Lai, Tze Leung; Haddad, Gabriel G.; Muotri, Alysson

doi:10.3389/fams.2021.689991

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 15 July 2021

Sec. Statistics and Probability

Volume 7 - 2021 | https://doi.org/10.3389/fams.2021.689991

This article is part of the Research Topic2021 Editor’s Pick: Applied Mathematics and StatisticsView all 13 articles

Oscillatory Biomedical Signals: Frontiers in Mathematical Models and Statistical Analysis

Hau-Tieng Wu¹

Tze Leung Lai²*

Gabriel G. Haddad³

Alysson Muotri⁴

¹Department of Mathematics, Duke University, Durham, NC, United States
²Department of Statistics, Stanford University, Stanford, CA, United States
³Department of Pediatrics and Rady Children’s Hospital, University of California at San Diego, San Diego, CA, United States
⁴Department of Cellular & Molecular Medicine, Department of Pediatrics, University of California at San Diego, San Diego, CA, United States

Herein we describe new frontiers in mathematical modeling and statistical analysis of oscillatory biomedical signals, motivated by our recent studies of network formation in the human brain during the early stages of life and studies forty years ago on cardiorespiratory patterns during sleep in infants and animal models. The frontiers involve new nonlinear-type time–frequency analysis of signals with multiple oscillatory components, and efficient particle filters for joint state and parameter estimators together with uncertainty quantification in hidden Markov models and empirical Bayes inference.

1 Introduction

The 2017 Nobel Prize in Physiology or Medicine was awarded to Jeffrey Hall and Michael Rosbash of Brandeis University, and Michael Young of Rockefeller University, “for their discoveries of molecular mechanisms controlling the circadian rhythm.” In 1984, they succeeded in isolating the “period gene” (i.e., the gene that controls the circadian rhythm). Hall and Rosbash then went on to “discover PER, the protein encoded by period, accumulated during the night and degraded during the day.” In 1994, Young answered a “tantalizing puzzle” concerning how PER produced in the cytoplasm could reach the cell nucleus where genetic material is located. He discovered a second gene timeless, encoding the TIM protein so that TIM bound to PER can enter the cell nucleus to block the period gene activity. “Such a regulatory feedback mechanism explained how this oscillation of cellular protein levels emerged, but questions lingered,” such as what controlled the frequency of the oscillations. Young identified another gene doubletime encoding the DBT protein that delayed the accumulation of the PER protein. The three laureates identified additional proteins required for the activation of the period gene, as well as for the mechanisms by which light can synchronize the circadian clock.

One of us (Muotri) was PI of a project on “spontaneous network formation” displaying “periodic and regular oscillatory events that were dependent on glutamatergic and GABAergic signaling” during early the brain maturations, for which structural and transcriptional changes “follow fixed developmental programs defined by genetics,” see [1] who also found that “the oscillatory activity transitioned to more spatiotemporally irregular patterns which synchronous network activity resembled features similar to those observed in preterm human EEG.” This project is similar in spirit to the exemplary work of Hall, Rosbash, and Young but the “experimental inaccessibility” of the human brain during the early stages of life pushes mathematical modeling and statistical analysis of the oscillatory signals and events to new frontiers that we present in the next section. We describe in the next paragraph the underlying biomedical background of this project.

One of the major recent realizations, especially in the neurosciences, is that while we can obtain important information from animal studies, there are major differences between humans and animals. This is manifested in many ways, especially in those major clinical trials based on animal findings that did not pan out. Therefore, if we intend to study pathogenesis of disease, treat them, prevent them, or cure diseases across the age spectrum, we need to refocus our scientific approaches and strategies in order to be more efficient and effective. Since embryonic stem cells are often problematic to obtain for ethical reasons, the discovery of being able to re-program somatic cells from humans into induced pluro-potential stem cells (iPSCs, taking these somatic cells back into their “history”) and differentiate them into different relatively mature cell types have opened a major avenue for the scientific community, resulting in the 2012 Nobel Prize in Physiology or Medicine to John Gurdon of Cambridge and Shinya Yamanaka of Kyoto. If these iPSCs are exposed to the right growth factors, they would assemble into the early human brain (brain organoids) by an amazing process of self-organizing the 3-dimensional cellular elements that recapitulate the network, cellular, and membrane properties of neurons and the glia. Many types of organoids such as the kidney, the intestine, the liver, and the lung organoids have been recently developed. These organoids have been particularly useful for studying either normal early human biology or developmental disorders as in neurodevelopmental diseases.

2 Methods

The statistical methods used by Trujillo et al. (2019, pp. 16–19) in their analysis of data on oscillatory signals and events consist of 1) multi-electrode array (MEA) recording and custom analysis, 2) network event analysis that involves detecting spikes (when at least 80% of the maximum spiking values) over the length of the recording when reached at least 1 s away from any other network event, 3) oscillatory spectral power analysis, in which “oscillatory power” is defined as “peaks in the PSD (power spectral density estimated by Peter D. Welch’s method) above the aperiodic $1 / f$ power law decay,” and 4) resampled Pearson’s correlation coefficient between neonatal age and each of 12 EEG features. Because of “the inability to interrogate the electrophysiology of intact human brains” and the emergence of induced pluripotent stem cells (iPSCs) and organoids as “a scaled-down and three-dimensional model of the human brain, mimicking various developmental features at cellular and molecular levels,” [1, pp. 4, 7–9, 18] used oscillatory dynamics of LFP (local field potential) and other mesoscopic brain signals, which manifest “a phenomenon known as cross-frequency phase-amplitude coupling (PAC) wherein the high-frequency content of LFP is entrained to the phase of slow oscillations.” Noting that “the pattern of alternating periods of quiescence and network-synchronized events resembles electrophysiological signatures in preterm human EEG,” [1] analyzed “a publicly available dataset of 101 serial EEG recordings from 39 preterm infants ranging from 24 to 38 weeks post-menstrual age,” containing 23 precomputed features (including spectral power in canonical oscillatory bands, duration, and timing of “spontaneous activity transients” or SATs) for each EEG record. To compare the features between cortical organoids and preterm infants, [1] trained a regularized regression model (ElasticNet) with cross-validation for hyparameter selection based on the preterm infants’ EEG recordings and applied the model to the organoid dataset to “obtain the predicted developmental time.” The results were mixed and [1] concluded that “given the potential roles of synchronized and oscillatory network dynamics in coordinating information flow between developed brain regions, these results highlight the potential for cortical organoids to advance our understanding of functional physiology” and to model “cellular interactions and neural circuit dysfunctions related to neurodevelopmental and neuropsychiatric pathologies” that “affect millions of people but otherwise lack an existing animal model.” These statistical methods are “custom” (or traditional) methods, as acknowledged by [1]. We describe innovative and powerful methods in the next two subsections, first for time–frequency analysis of oscillatory biomedical signals with time-varying features and then a new hidden Markov model (HMM) which incorporates the key features of the cortical organoid model and provides uncertainty quantification for empirical Bayes inference based on the model and observed data.

2.1 Time–Frequency Analysis of Signals With Multiple Oscillatory Components

The first author (Wu) has been working on time–frequency analysis (TFA) and its applications to high-frequency biomedical signals in the last ten years. Examples include electrocardiography, electroencephalogram, local field potential, photoplethysmogram (PPG), actinogram, peripheral venous pressure (PVP), arterial blood pressure, phonocardiogram, and airflow respiratory signal, to name several. Usually, these signals are composed of multiple components, each of which reflects the dynamics of a physiological system. The analysis is challenged by the physiological variability that appears in the form of time-varying frequency and amplitude or even time-varying oscillatory pattern that is referred to as the “wave-shape function.” Furthermore, depending on the signal, the “waxing and waning” effect is sometimes inevitable for its components; see [2, Figure 1] for an illustration. Take the widely applied PPG signal as an example, for which [3] has given an introduction to photoplethysmogram (PPG) and its applications “beyond the calculation of arterial oxygen saturation and heart rate.” In addition to the well-known cardiac component reflecting hemodynamic information, PPG may contain the respiratory dynamics as another component. The frequency of the cardiac component (respiratory component, respectively) is impacted by the heart rate variability (breathing rate variability, respectively). Cicone and [4] provide an algorithm to “extract both heart and respiratory rates” from the PPG signal and thereby to analyze their interactions. Such information can be used in conjunction with other biomedical signals reflecting hemodynamics. In particular, PVP is ubiquitous in the hospital environment and a rich source of hemodynamic information [5]. But, it typically has a low signal-to-noise ratio (SNR), and its oscillatory pattern is sensitive to the physiological status, making it much less used in comparison with PPG. Wu et al. [6] have developed new signal processing tools to facilitate its use.

Combining time–frequency analysis (TFA) with statistical analysis, the lack of which in the previous work “presents an opportunity for much future research,” is illustrated in Figure 2 (applied to PPG, fetal ECG, and fetal heart rate variability) of Wu [2] who describes several recent advances in TFA for high-frequency biomedical signals. There are several challenges common to different biomedical signal processing problems. The first is how to estimate the dynamics (e.g., how to quantify the time-varying frequency, amplitude, or wave-shapes) of the signal. The second is to assess signal quality and determine artifacts, distinguishing between physiological and nonphysiological ones. The third is to identify oscillatory components and the fourth is to decompose the signal into constituent components. To address these challenges, several TFA tools have been proposed. In addition to the traditional linear-type time–frequency analysis tools like short-time Fourier transform (STFT), continuous wavelet transform (CWT), and bilinear time-frequency analysis tools [7], several nonlinear-type tools have been developed and applied, including the reassignment method, empirical mode decomposition (EMD), Blaschke decomposition (BKD), adaptive locally iterative filtering (ALIF), sparse time-frequency representation (STFR), synchrosqueezing transform (SST), scattering transform (ST), concentration of frequency and time (ConcFT), de-shape and dynamic diffusion maps [8–14]. The statistical properties of these methods have been relatively unexplored and we are currently investigating them; new methods to handle emerging scientific problems might be developed on the way.

2.2 Efficient Particle Filters for Joint State and Parameter Estimation in HMM

During the past three years, the second author (Lai) has been developing a new Markov Chain Monte Carlo (MCMC) procedure called “MCMC with sequential substitutions” (MCMC-SS) for joint state and parameter estimation in hidden Markov models. The basic idea is to approximate an intractable distribution of interest (or target distribution) by the empirical distribution of N representative atoms, chosen sequentially by an MCMC procedure, so that the empirical distribution approximates the target distribution after a large number of iterations as explained below.

Lai’s work in this area began with the landmark paper of Gordon, Salmond and Smith [15] on the development of sequential Monte Carlo (SMC), also called particle filters, for the estimation of latent states in a hidden Markov model (HMM). Liu’s monograph [16] contains a collection of techniques that have been developed since then, with examples of applications in computational biology and engineering, and Chan and Lai [17] provide a general theory of particle filters. Let $= {X_{t}, t ⩾ 1}$ be a Markov chain and let $Y_{1}, Y_{2}, \dots$ be conditionally independent given, such that $X_{t} \sim p_{t} (\cdot | X_{t - 1}), Y_{t} \sim g_{t} (\cdot | X_{t})$ in which $p_{t}$ and $g_{t}$ are density functions with respect to measures $ν_{X}$ and $ν_{Y}$ . The density function $p_{T}$ of $X_{0 : T} = (X_{0}, \dots, X_{T})$ conditional on $Y_{1 : T} = (Y_{1}, \dots, Y_{T})$ is given by

p_{T} (x_{0 : T} | Y_{1 : T}) \propto \prod_{t = 1}^{T} [p_{t} (x_{t} | x_{t - 1}) g_{t} (Y_{t} | x_{t})] .

This conditional distribution is often difficult to sample from and the normalizing constant is also difficult to compute for high-dimensional or complicated state spaces, and particle filters use sequential Monte Carlo that involves importance sampling and resampling to circumvent this difficulty. The particle filter computes $E [ψ (X_{0 : T}) | Y_{1}, \dots, Y_{T}]$ by the recursive Monte Carlo scheme summarized in Algorithm 1. Let $X_{0 : t - 1}^{m}$ denote the sample path of the mth particle (trajectory), $1 ⩽ m ⩽ M$ . The scheme uses importance sampling from a proposal density $q_{t}$ to circumvent this difficulty and updates not only the particles $X_{0 : t - 1}^{m}$ but also the associated weights $w_{t - 1}^{m}$ and ancestor $A_{t - 1}^{m}$ of $X_{0 : t}^{m}$ . It is initialized with $A_{0}^{m} = m$ and $w_{0}^{m} = 1$ . The SMC estimate of $ψ_{T} : = E [ψ (X_{0 : T}) | Y_{1 : T}]$ is given by:

{\tilde{ψ}}_{T} = (\sum_{m = 1}^{M} w_{T}^{m} ψ (X_{0 : T}^{m})) / (\sum_{m = 1}^{M} w_{T}^{m}) .

By using martingale theory, Chan and Lai [17] provide a comprehensive theory of the SMC estimate ${\tilde{ψ}}_{T}$ , which includes asymptotic normality and consistent standard error estimation as follows:

Theorem 1. Under certain Integrability Conditions,

\sqrt{M} ({\tilde{ψ}}_{t} - ψ_{T}) \Rightarrow N (0, σ^{2}) .

Moreover, letting ${\bar{w}}_{t} = M^{- 1} {\sum^{}}_{i = 1}^{M} w_{t}^{i}$ , $σ^{2}$ can be consistently estimated by the following equation:

{\tilde{σ}}^{2} = \frac{1}{M} \sum_{m = 1}^{M} {(\sum_{i : A_{T - 1}^{i} = m} \frac{w_{T}^{i}}{{\bar{w}}_{T}} [ψ (X_{0 : T}^{i}) - {\tilde{ψ}}_{T}])}^{2} .

ALGORITHM 1

Algorithm 1. SMC with M particles

Chan and Lai [17, Lemmas 1 and 4] use the following representation of ${\tilde{ψ}}_{T} - ψ_{T}$ to derive Theorem 1. Let $w_{t} (x_{0 : t}) = p_{t} (x_{t} | x_{t - 1}) g_{t} (Y_{t} | x_{t}) / q_{t} (x_{t} | x_{0 : t - 1})$ , in which that $Y_{t}$ can be treated as constants since the particle filter is the conditional distribution of $X_{0 : t}$ given the observations $Y_{1}, \dots, Y_{T}$ . Let $H_{t}^{m} = ({\bar{w}}_{1}, \dots, {\bar{w}}_{t}) / {\prod^{}}_{j 1}^{t} w_{j}^{m}, η_{t} = E_{q} [{\prod^{}}_{i = 1}^{t} w_{i} (X_{0 : t})]$ , where $E_{q}$ denotes expectation under which $X_{t} | X_{0 : t - 1}$ has the conditional density function $q_{t} (\cdot | X_{0 : t - 1})$ for $1 ⩽ t ⩽ T$ . Letting $Ψ_{0} = ψ_{T}$ and $Ψ_{t} (X_{0 : t}) = E_{q} {ψ (X_{0 : T}) {\prod^{}}_{i = 1}^{T} w_{i} (X_{0 : i}) | X_{0 : t}}$ for $1 ⩽ t ⩽ T$ , define

\begin{array}{l} ϵ_{2 t - 1}^{m} = \sum_{i : A_{t - 1}^{m} = m} {ψ (X_{0 : t}^{i}) - Ψ_{t - 1} (x_{0 : t - 1}^{i})} H_{t - 1}^{i}, \\ ϵ_{2 t}^{m} = \sum_{i : A_{t - 1}^{m} = m} (#_{t}^{m} - m W_{t}^{i}) {Ψ_{t} (X_{0 : t}^{i}) H_{t}^{B_{t}^{i}} - Ψ_{0}}, \end{array}

in which $W_{t}^{i} = w_{t}^{i} / {\sum^{}}_{j = 1}^{M} w_{t}^{j}$ , $#_{t}^{i}$ is the number of copies of $X_{0 : t}^{i}$ generated by bootstrap resampling from ${X_{0 : t}^{1}, \dots, X_{0 : t}^{M}}$ in Algorithm 1 (where the $B_{t}^{i}$ is also defined). Then $(#_{t}^{1}, \dots, #_{t}^{M}) \sim Multinomial (M; W_{t}^{1}, \dots, W_{t}^{M})$ and

{\tilde{ψ}}_{T} - ψ_{T} = {{({\bar{w}}_{1} \dots {\bar{w}}_{T})}^{- 1} η_{T}} M^{- 1} \sum_{m = 1}^{M} (ϵ_{1}^{m} + \dots + ϵ_{2 T - 1}^{m});

see Eqs. (3.3) and (3.36) of the study by authors of reference [15] which show that ${ϵ_{t}^{m}, 1 ⩽ t ⩽ 2 T - 1}$ is a martingale difference sequence and that ${({\bar{w}}_{1} \dots {\bar{w}}_{T})}^{- 1} η_{T} = 1 + o_{p} (1)$ under the integrability assumptions $η_{T} < \infty$ and $E_{q} [{\prod^{}}_{i = 1}^{T} w_{i}^{2} (X_{0 : t})] < \infty$ .

ALGORITHM 2

Algorithm 2. PMCMC at the kth iteration, initialized with $θ^{0} \sim f (\cdot)$

The assumption of a single fully specified HMM in particle filter is often too restrictive in applications since the model parameters are usually unknown and also need to be estimated sequentially from the observed data. A standard method to estimate unknown parameters is to assume a prior distribution for the unknown parameter vector and to use the Markov chain Monte Carlo (MCMC) to estimate the posterior distribution. Authors of reference [15] carried out this method for time-homogeneous Markov chains $X_{t} \sim p_{θ} (\cdot | X_{t - 1})$ for $t ⩾ 1$ and $X_{0} \sim p_{θ} (\cdot)$ , with latent states $X_{t}$ and observations $Y_{t} \sim g_{θ} (\cdot | X_{t})$ , in which θ is an unknown parameter with a prior density function $π (\cdot)$ with respect to some measure $ν_{θ}$ on the parameter space $Θ$ . The posterior density of $(θ, X_{0 : T})$ given $Y_{1 : T}$ is proportional to

p_{T} (ϑ, x_{0 : T}) = π (ϑ) p_{ϑ} (x_{0}) \prod_{t = 1}^{T} {p_{ϑ} (x_{t} | x_{t - 1}) g_{ϑ} (Y_{t} | x_{t})} .

PMCMC uses SMC involving M particles (each of which consists of a sampled parameter and state trajectory) at every iteration k to construct an approximation ${\tilde{p}}_{T}$ to $p_{T}$ in a Metropolis–Hastings (MH) MCMC scheme that uses a proposal density $f (\cdot | θ_{k - 1})$ with respect to the measure $v_{θ}$ to the sample $θ_{k}$ at the kth iteration, as summarized in Algorithm 2. Chopin et al. (2013, Section 1.2) point out the difficulties in the asymptotic analysis of PMCMC as k becomes infinite. In particular, although the authors of reference [15] have shown that under some strong assumptions, PMCMC converges to a measure in total variation norm as $k \to \infty$ , for fixed value of M, the limiting measure is not the target posterior distribution of $(θ, X_{0 : t})$ . On the other hand, allowing M to approach $\infty$ with k would lead to an analytically intracAlgorithm scheme involving state spaces whose dimensions change with k. Authors of reference [16] propose the SMC² scheme to target heuristically the posterior distribution of $(θ, X_{0 : t})$ given $Y_{1 : t}$ $(1 ⩽ t ⩽ T)$ as follows. It involves N θ-particles, which we will call “atoms,” and attaches to each atom θ a particle filter that propagates and resamples M particles (state trajectories $X_{0 : t}^{m}$ ) generated by SMC (as in Algorithm 1 with the given θ). It carries out the MH iterations to determine if a candidate atom is accepted (as in Step (c) of Algorithm 2). For the N atoms $θ_{t}^{1}, \dots, θ_{t}^{N}$ and their corresponding importance weights at time t generated in this way, if the degeneracy criterion in the study by the author of reference [17] is satisfied, carry out bootstrap resampling of the weighted parameter-particle set to replace it by an unweighted set, but no convergence theory as $k \to \infty$ is provided. Although MCMC methods with MH iterations are widely used computational tools in Bayesian inference on $θ \in Θ$ that has prior density function with respect to some measure $ν_{θ}$ , they do not have convergence rate guarantees in terms of the number of iterations to automate termination of the iterations. On the other hand, if the target density p, which is the posterior density of θ given $Y_{1 : t}$ , was known and easy to sample from, then the standard Monte Carlo approximation of $μ : = E_{p} (ψ (θ))$ could be carried out by generating i.i.d. $θ_{1}, \dots, θ_{N}$ from $p (\cdot) d m$ and using the sample average $\tilde{μ} = N^{- 1} {\sum^{}}_{n = 1}^{N} ψ (θ_{n})$ to estimate μ. Under the assumption $E_{p} (ψ^{2} (θ)) < \infty$ , the estimated standard error is ${\tilde{σ}}_{N} / \sqrt{N}$ , and $\tilde{μ} \pm N^{- 1 / 2} {\tilde{σ}}_{N} ζ_{1 - α / 2}$ is an approximate $(1 - α)$ -level confidence interval for μ, where ${\tilde{σ}}_{N}^{2} = {(N - 1)}^{- 1} {\sum^{}}_{n = 1}^{N} {(ψ (θ_{n}) - \tilde{μ})}^{2}$ and $ζ_{q}$ is the qth quantile of the standard normal distribution. This follows from the classical central limit theorem and is very useful for determining N to ensure $\tilde{μ}$ to be within some prescribed tolerance limit ϵ of μ: $N^{- 1 / 2} {\tilde{σ}}_{N} ζ_{1 - α / 2} ⩽ ϵ$ and has inspired Lai to develop, with his current Ph.D. students Huanzhong Xu, Michael Hongyu Zhu, and former Ph.D. student Hock Peng Chan, the following novel MCMC algorithm which is asymptotically equivalent to the oracle procedure that assumes known target density p and which they call MCMC with sequential state substitutions (MCMC-SS). As in MH, let f be a given function that is proportional to the target density. Let ${q (\cdot | γ) : γ \in Γ}$ be a family of positive proposal densities with respect to some measure m, where $Γ$ is a convex subset of $ℝ^{d}$ . MCMC-SS initializes by choosing $γ_{0} \in Γ^{o}$ and generating $ν B$ i.i.d. $θ_{1,0}^{1}, \dots, θ_{1,0}^{ν}; \dots; θ_{B, 0}^{1}, \dots, θ_{B, 0}^{ν}$ from the proposal distribution $q (\cdot | γ_{0}) d m$ , thereby forming the B disjoint sets $Θ_{b, 0} = {θ_{b, 0}^{1}, \dots, θ_{b, 0}^{ν}}$ . At the stage k, it uses the sequential substitution procedure SS $(Θ_{b, k}, w_{k}^{b})$ in Algorithm 3 to update the atom set in the bth block and to assign the weight $w_{i, k}^{b}$ to the ith atom in $Θ_{b, k}$ , $b = 1, \dots, B$ . MCMC-SS estimates $μ = E_{p} ψ (θ)$ by the following equation:

\hat{ψ} = \frac{1}{B (K - κ)} \sum_{b = 1}^{B} \sum_{k = κ + 1}^{K} {\hat{ψ}}_{b, k}, with {\hat{ψ}}_{b, k} = \frac{{\sum^{​}}_{i = 1}^{ν} w_{i, k}^{b} ψ (θ_{i, k}^{b})}{{\sum^{​}}_{i = 1}^{ν} w_{i, k}^{b}},

in which κ represents an initial burn-in period that is asymptotically negligible as $κ = o (K)$ . In many applications, the parameter γ of the family of proposal densities is a function $γ : P \to Γ$ , where $P$ is the space of probability measures on $Θ$ . Assuming this framework, we now describe the choice of $γ_{b, k - 1}$ in Algorithm 3. For $k ⩽ κ$ , let $γ_{b, k - 1} = ν^{- 1} \sum_{θ \in Θ_{b, k - 1}} γ (θ)$ , which is the mean of the empirical measure of the atoms in the bth block at the end of stage $k - 1$ . On the other hand, for $k > κ$ , we pool across blocks by letting $γ_{k - 1} = B^{- 1} {\sum^{}}_{b = 1}^{B} γ_{b, k - 1}$ , which we use as the modified $γ_{b, k - 1}$ for all blocks. Therefore, after the burn-in period, we can carry out the update $SS (Θ_{b, k})$ in the order $b = 1, \dots, B$ , so that if the candidate atom in SS( $Θ_{b, k}$ ) is not used for block b, it can serve as candidate atom for the block $b + 1$ ( $⩽ B$ ), which then does not need to generate another random variable from $q (\cdot | γ_{k - 1})$ , an obvious advantage for high-dimensional and complicated states. Lai, Xu, Zhu, and Chan have developed a comprehensive asymptotic theory of MCMC-SS showing its asymptotic optimality with respect to computational and statistical criteria and have also derived consistent estimators of the standard errors for the Monte Carlo state/parameter estimates; see reference [18] for which the main results are summarized in the following that includes Algorithm 3. Moreover, Algorithm 3 that can be vectorized and parallelized and illustrate its applications to latent variable analysis with uncertainty quantification in image reconstruction and brain network development.

ALGORITHM 3

Algorithm 3. Updating procedure $SS (Θ_{b, k}, w_{k}^{b})$ for MCMC-SS

Theorem 2. Suppose $E_{p} ψ^{2} (θ)$ and there exist $β > α > 0$ and $V : Θ^{ν} \to [1, \infty)$ such that for $γ : P \to Γ$ ,

\begin{array}{l} \int_{Θ^{ν}} V (θ) q (θ^{1} | γ_{0}) \dots q (θ^{ν} | γ_{0}) d m^{ν} (θ) < \infty w i t h θ = (θ^{1}, \dots, θ^{ν}), and \\ α V (θ) ⩽ λ (\tilde{θ} | γ (θ)) ⩽ β V (θ) for all θ \in Θ^{ν} and \tilde{θ} \in Θ, \end{array}

where $λ (\tilde{θ} | γ) = f (\tilde{θ} | γ) / p (\tilde{θ})$ .

(i) Let $G_{b, k}$ be the joint distribution of $(θ_{1, k}^{b}, \dots, θ_{ν, k}^{b})$ and let $Q^{ν}$ be the probability measure on $Θ^{ν}$ that has the density of ν independent components each of which has density $q (\cdot | γ_{f})$ with respect to m, where $γ_{f} = a r g m i n_{γ \in Γ} I (q_{γ} ∥ f)$ and $I (q ∥ f) = E_{f} {log (q (θ) / f (θ))}$ is the Kullback–Leibler divergence (or relative entropy) of q from the target density f in Algorithm 3 Then there exist positive constants a and c such that ${‖ G_{b, k} - Q^{ν} ‖}_{V} ⩽ c e^{- a k}$ for $1 ⩽ k ⩽ K$ , where $∥ \cdot ∥_{V}$ denotes the weighted total variation norm associated with the weight function V. Hence, after $k ≻ \log B$ iterations, $\sum_{b ⩽ B} {‖ G_{b, k} - Q^{ν} ‖}_{V} \to 0$ .

(ii) Let $N = B (K - κ)$ be the total number of atoms used to define the MCMC-SS estimate of $\tilde{ψ}$ of $μ = E_{p} (ψ (θ))$ . Then as $K \to \infty$ and $B \to \infty$ such that $B = O (K)$ ,

\sqrt{N ν} (\hat{ψ} - μ) \Rightarrow N (0, σ^{2}),

where $σ^{2} = V a r_{p} (ψ (θ))$ and can be consistently estimated by:

{\hat{σ}}^{2} = \frac{1}{B (K - κ)} \sum_{b = 1}^{B} \sum_{k = κ + 1}^{K} \frac{1}{ν - 1} \sum_{θ \in Θ_{b, k}} {(ψ (θ) - {\hat{ψ}}_{b, k})}^{2} .

As shown in reference [21], with probability approaching 1 by large k, the candidate atom $\tilde{θ}$ in Algorithm 3 substitutes some existing atom in $Θ_{b, k - 1}$ . Hence, similar to the case of known target density p from which $\tilde{θ}$ is sampled, the newly sampled atom features in the weighted average ${\hat{ψ}}_{b, k}$ . The reason we need the weighted average, with “importance sampling weights” $w_{i, k}^{b}$ , is that for large k, the conditional distribution of $Θ_{b, k}$ given $Θ_{b, k - 1}$ behaves like the ν-fold product measure $Q^{ν}$ on $Θ^{ν}$ . This shows that importance sampling (likelihood ratio) weights $w_{i, k}^{b}$ are needed to convert Q to P and suggests the asymptotic optimality of $\hat{ψ}$ , which is the overall average of the $B (K - κ)$ estimates ${\hat{ψ}}_{b, k}$ , similar to $\hat{μ}$ that is described for the case of known p. Each random variable generated in the MCMC-SS scheme asymptotically contributes weight ${(N ν)}^{- 1}$ to (a) the estimate $\hat{ψ}$ of μ and (b) the asymptotic variance of $\hat{ψ}$ . Theorem 2 shows that there is in fact considerable flexibility in the choice of the factors K (the number of iterations) and B (the number of blocks) in $N = B (K - κ)$ that determines the scaling factor in the central limit theorem, although the theorem highlights the case $B = O (K)$ to emphasize that K should not be chosen too small relative to B. Reference [21] gives an application to uncertainty quantification in the following image reconstruction problem. Reference [22] propose to use MCMC methods “whenever the target measure has density with respect to a Gaussian process or Gaussian random field reference measure.” A wide range of applications involving such a framework considers Bayesian inference on a latent random field ${u (x) : x \in D} \subset ℝ^{d}$ generated by some stochastic partial differential equation (SPDE) in which D is a connected subset of $ℝ^{d'}$ , based on data generated by some nonlinear function of the random field. It is shown that after discretization and truncation to fit into this framework, the Radon–Nikodym derivative of the target measure P with respect to the reference measure Q has the following form:

(d Q / d P) (u) \propto exp (- l (u)),

for some real-valued function l, which [19] call “potential” in their substantive applications. The advantage of using a zero-mean Gaussian random field reference measure Q is that it is specified by the covariance operator $C$ whose eigenvalues $λ_{i}$ and orthonormal eigenfunctions $ϕ_{i}$ yield the Karhunen–Loève expansion $u (x) = {\sum^{}}_{i = 1}^{\infty} ξ_{i} ϕ_{i} (x)$ , with i.i.d. $ξ_{i}$ that are $N (0, λ_{i}^{2})$ and ${\sum^{}}_{i = 1}^{\infty} λ_{i}^{2} < \infty$ . Reference [22] uses a random truncation τ with a sieve prior to convert the infinite-dimensional expansion to a finite sum $u (x) = {\sum^{}}_{i = 1}^{τ} ξ_{i} ϕ_{i} (x)$ . In addition, a discrete approximation of the random field $u (x)$ is used, with x taken over a mesh of width δ in each coordinate. MCMC-SS uses a parametric family of Gaussian proposal measures $Q (γ)$ instead of a single one by [22]. Putting $1 / L (θ) = exp (- l (u (x)))$ , we can also incorporate the random truncation τ and possibly also other random effects ρ into the state $θ = (τ, ζ_{1}, \dots, ζ_{τ}, ρ)$ , where $ζ_{j} = G (u (x_{j}))$ , $j = 1, \dots, τ$ , and $G$ is an operator associated with the SPDE and the discretization scheme for which $x_{j}$ belongs to a discrete subset of D. With this definition of θ, MCMC-SS uses the updating procedure described in Algorithm 3. Section 4.2 of [22] argues that simply applying MCMC to a discretized random field leads to a singular reference measure with respect to the target measure. However, the MCMC procedure used in [22] is the random walk Metropolis algorithm that involves the acceptance probability $a (u, v) = min {1, (d η^{*} / d η) (u, v)}$ , where η is the measure defined by the transition kernel $q (u, v)$ of the MCMC algorithm (i.e., $v | u \sim q (u)$ ) and $η^{*}$ is the measure obtained by reversing the roles of u and v in the definition of η. Theorem 6.3 of [22] shows that after discretization, $η^{*}$ is singular with respect to η and therefore “all proposal moves are rejected with probability 1” for the random walk Metropolis algorithm, which proposes $v^{(k)} = u^{(k)} + β ξ^{(k)}$ , with $ξ^{(k)} \sim N (0, C)$ , and chooses $u^{(k + 1)} = v^{(k)}$ with probability $a (u^{(k)}, v^{(k)})$ , setting $u^{(k + 1)} = u^{(k)}$ if $v^{(k)}$ is rejected. To get around this difficulty, [22] introduces a preconditioned Crank–Nicolson (pCN) adjustment, which proposes $v^{(k)} = \sqrt{1 - β^{2}} u^{(k)} + β ξ^{(k)}$ . Here, $β^{2} = 8 δ / {(2 + δ)}^{2}$ and C is the covariance matrix (after truncation and discretization) of the covariance operator $C$ for the Gaussian proposal measure. Because MCMC-SS does not involve η and $η^{*}$ , it does not require the pCN adjustments; see reference [20] for details and further discussion. Making use of bounds on a weighted total variation norm of the difference between the target distribution and the empirical measure defined by the sample paths of the MCMC procedure, reference [21] has developed an asymptotic theory of the MCMC-SS estimates, as both K and N approach $\infty$ , of functionals of the target distribution. This asymptotic theory includes asymptotic normality of the MCMC-SS estimates, provides consistent estimators of their standard errors, and establishes their asymptotic optimality by deriving certain oracle properties. Implementation via sequential Monte Carlo schemes called “particle filters” and parallelization is also given. In his Ph.D. thesis, Zhu who is a coauthor of [21] describes a numerically stable implementation of MCMC-SS that can be vectorized and parallelized, using Julia v0.62 [23] and the ArrayFire GPU library [24]. He also develops scalable implementations for high-dimensional states/parameters using differentiation through mixture distributions for stochastic gradient descent; see [25]. In the context of cortical organoids described in the first paragraph of Section 2, the target distribution is the posterior distribution of a precomputed feature of the organoid as a scaled-down model of the preterm human brain, conditional on the observations which are the 101 serial EEG recordings from 39 preterm infants. The uncertainty quantification [21] of the posterior distribution of a precomputed feature of cortical organoids provides a principled and systematic approach to the comparison of the feature between cortical organoids and the observations from the preterm infants, in contrast to the lack of uncertainty quantification for the approach and results of [1, pp. 8–9 and Fig. 4A, B, C, and D on p. 31] mentioned in the first paragraph of Section 2. Moreover, the methods of time–frequency analysis in the preceding subsection can be used to compute the predictive distribution of the feature of the cortical organoids given the observations, which is the same as the target distribution. The predictive distribution typically also involves an unspecified hyperparameter vector $θ$ , as in manifold learning of [14]. This corresponds to a Bayesian approach with prior densities belonging to a family of proposal densities $q (θ | γ)$ , in which $γ \in Γ$ indexes the family and $Γ$ is a convex subset of $ℛ^{d}$ . Reference [21] has shown that MCMC-SS eventually samples from $q (\cdot | γ_{p})$ that has the smallest Kullback–Leibler divergence from $p (\cdot)$ , and therefore from the target density if it belongs to ${q (\cdot | γ) : γ \in Γ}$ .

3 Discussion and Concluding Remarks

Haddad and Lai actually initiated similar research forty years ago when they worked on cardiorespiratory patterns during sleep in a SIDS (Sudden Infant Death Syndrome) project at Columbia University’s Pediatrics Department; see reference [26] which describes the study population consisting of 12 infants “with one or more episodes of aborted SIDS” (four of whom had siblings who died of SIDS), and 19 normal infants, all born full-term except for one aborted SIDS infant born at 37 weeks of gestation. After describing the study design and methods of statistical analysis, the authors of reference [26] presented results on total tidal volume (Vt), respiratory cycle time (Ttot), and increase in Vt/Ttot resulting from 2% increase in CO₂ concentration in the sleeping chamber, comparing aborted SIDS to normal infants in both REM (rapid eye movement) and quiet sleep. Because of the inability to induce stress such as loaded breathing as in reference [27, 28], animal models involving sheep, puppies, and dogs were used; see also [29]. In particular, the authors of reference [24] “studied diaphragmatic muscle function during inspiratory flow resistive loaded breathing” in 6 unanesthetized sheep over periods of 6–8 months. Data were collected (baseline) and after application of the loads that were sustained for up to 90 min. Loads were divided into mild (<50 cm H₂O $\cdot$ $l^{- 1}$ s), moderate (50–150 cm H₂O $\cdot$ $l^{- 1}$ s), and severe (>150 cm H₂O $\cdot$ $l^{- 1}$ s). They found that “1) the diaphragm is capable of generating large pressure for prolonged periods with no evidence of fatigue, 2) with very high inspiratory resistive loads mechanical failure of the diaphragm can occur, 3) diaphragmatic fatigue is associated with acute hypercania and therefore failure of the entire respiratory pump, and 4) a decrease in integrated EMG (iEMG) and a concomitant shift in the EMG power spectral density toward lower frequencies precede the mechanical failure of the diaphragm.” Thus, similar to the power spectral density of the EEG signal in the first paragraph of Section 2, [27] uses a shift of the power spectrum of the EMG toward lower frequencies to identify the onset of diaphragmatic muscle fatigue in adult sheep. The frontier methods of time–frequency analysis in Section 2.1 are therefore also relevant to the problem of diaphragmatic muscle fatigue and rhythmic variations in cardiorespiratory signals studied by Haddad and Lai forty years ago. Pointing out that in the 1980s “investigators from various disciplines focused their efforts on finding out whether SIDS is related to hypoxia or anoxia (acute or chronic) before death and whether this relation is responsible for events leading to death”, Haddad [30], reviewed “studies in the recent past” from various fields—epidemiology, physiology of infant death and SIDS, pathology of the airway, and animal studies. Although “most of the evidence accumulated so far, including that obtained in the past two years, is circumstantial,” he concluded that “SIDS was little understood for many years until, over the past few years, its basic underlying genetic defect was better characterized (from recent animals and human studies), and light could finally be seen at the end of the tunnel,” again linking genetics and feedback mechanisms to see this light, as in the exemplary work of Hall, Rosbach, and Young on the circadian rhythm. Combining various clues and insights from different areas/studies via an empirical Bayes model is the capability of the frontier approach described in Section 2.2; see Sections 3.6.3, 5.4, 6.2.3 and 7.4 of reference [31] on postmarketing monitoring of medical product safety.

A related direction of our ongoing research is to combine several biomedical signals, which form a multivariate time series, thereby providing a more holographic view of a human subject. For example, in an intensive care unit, PPG can be combined with EEG, EMG, respiratory, and other signals to evaluate a patient’s health status. Wu and his collaborators have applied in [32,33] a combination of ST and EEG channels to study sleep dynamics and an “interpretable machine learning algorithm” to assess consistency of sleep-stage scoring rules across multiple sleep centers. How to utilize available information from multiple centers is a sensor fusion problem. We are currently combining recent advances in sensor fusion with those in TFA to develop integrated statistical analysis of the multivariate time series of multiple biomedical signals.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, and further inquiries can be directed to the corresponding author.

Author Contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work, and approved it for publication.

Funding

TL’s research was supported by the National Science Foundation under DMS-1811818. GH’s research was supported by the National Institutes of Health under 1R01HL146530 and 1R21NS111270. AM’s research was supported by the National Institutes of Health under DP2-OD006495.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We thank TL’s research assistant Chenru Liu at Stanford University for her valuable contributions to the preparation, timely submission, and revision of the paper. We also thank the reviewers whose innovative comments and suggestions have resulted in substantial improvement of the presentation.

References

1. Trujillo, CA, Gao, R, Negraes, PD, Gu, J, Buchanan, J, Preissl, S, et al. Complex Oscillatory Waves Emerging from Cortical Organoids Model Early Human Brain Network Development. Cell Stem Cell (2019). 25(4):558–69. doi:10.1016/j.stem.2019.08.002

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Wu, H-T. Current State of Nonlinear-type Time-Frequency Analysis and Applications to High-Frequency Biomedical Signals. Curr Opin Syst Biol (2020). 23:8–21. doi:10.1016/j.coisb.2020.07.013