Bayesian probability updates using sampling/importance resampling: Applications in nuclear theory

Jiang, Weiguang; Forssén, Christian

doi:10.3389/fphy.2022.1058809

ORIGINAL RESEARCH article

Front. Phys., 03 November 2022

Sec. Nuclear Physics

Volume 10 - 2022 | https://doi.org/10.3389/fphy.2022.1058809

This article is part of the Research TopicUncertainty Quantification in Nuclear PhysicsView all 16 articles

Bayesian probability updates using sampling/importance resampling: Applications in nuclear theory

Weiguang Jiang*

Christian Forssén

Department of Physics, Chalmers University of Technology, Göteborg, Sweden

We review an established Bayesian sampling method called sampling/importance resampling and highlight situations in nuclear theory when it can be particularly useful. To this end we both analyse a toy problem and demonstrate realistic applications of importance resampling to infer the posterior distribution for parameters of ΔNNLO interaction model based on chiral effective field theory and to estimate the posterior probability distribution of target observables. The limitation of the method is also showcased in extreme situations where importance resampling breaks.

1 Introduction

Bayesian inference is an appealing approach for dealing with theoretical uncertainties and has been applied in different nuclear physics studies [1–16]. In the practice of Bayesian analyses, a sampling procedure is usually inevitable for approximating the posterior probability distribution of model parameters and for performing predictive computations. Various Markov chain Monte Carlo (MCMC) methods [17–21] are often used for this purpose, even for complicated models with high-dimensional parameter spaces. However, MCMC sampling typically requires many likelihood evaluations, which is often a costly operation in nuclear theory, and there is a need to explore other sampling techniques. In this paper, we review an established method called sampling/importance resampling (S/IR) [22–24] and demonstrate its use in realistic nuclear physics applications where we also perform comparisons with MCMC sampling.

In recent years, there has been an increasing demand for precision nuclear theory. This implies a challenge to not just achieve accurate theoretical predictions but also to quantify accompanying uncertainties. The use of ab initio many-body methods and nuclear interaction models based on chiral effective field theory (χEFT) has shown a potential to describe finite nuclei and nuclear matter based on extant experimental data (e.g. nucleon-nucleon scattering, few-body sector) with controlled approximations [25–29]. The interaction model is parametrized in terms of low-energy constants (LECs), the number of which is growing order-by-order according to the rules of a corresponding power counting [30–32]. Very importantly, the systematic expansion allows to quantify the truncation error and to incorporate this knowledge in the analysis [4–6, 10–14]. Indeed, Bayesian inference is an excellent framework to incorporate different sources of uncertainty and to propagate error bars to the model predictions. Starting from Bayes’ theorem

p r (θ | D) \propto L (θ) p r (θ), (1)

where $p r (θ | D)$ is the posterior probability density function (PDF) for the vector θ of LECs (conditional on the data $D$ ), $L (θ) \equiv p r (D | θ)$ is the likelihood and pr(θ) is the prior. Then for any model prediction one needs to evaluate the expectation value of a function of interest y(θ) (target observables) according to the posterior. This involves integrals such as

\int d θ y (θ) p r (θ | D), (2)

which can not be analytically solved for realistic cases. Fortunately, integrals such as Eq. 2 can be approximately evaluated using a finite set of samples ${θ_{i}}_{i = 1}^{N}$ from $p r (θ | D)$ . MCMC sampling methods are the main computational tool for providing such samples, even for high-dimensional parameter volumes [16]. However the use of MCMC in nuclear theory typically requires massive computations to record sufficiently many samples from the Markov chain. There are certainly situations where MCMC sampling is not ideal, or even becomes infeasible:

1) When the posterior is conditioned on some calibration data for which our model evaluations are very costly. Then we might only afford a limited number of full likelihood evaluations and our MCMC sampling becomes less likely to converge.

2) Bayesian posterior updates in which calibration data is added in several different stages. This typically requires that the MCMC sampling must be carried out repeatedly from scratch.

3) Model checking where we want to explore the sensitivity to prior assignments. This is a second example of posterior updating.

4) The prediction of target observables for which our model evaluations become very costly and the handling of a large number of MCMC samples becomes infeasible.

These are situations where one might want to use the S/IR method [23, 24], which can exploit the previous results of model evaluations to allow posterior probability updates at a much smaller computational cost compared to the full MCMC method. In the following sections we first review the S/IR method and then present both toy and realistic applications in which its performance is compared with full MCMC sampling. Finally, we illustrate limitations of the method by considering cases where S/IR fails and we highlight the importance of the so-called effective number of samples. More difficult scenarios, in which the method fails without a clear warning, are left for the concluding remarks.

2 Sampling/importance resampling

The basic idea of S/IR is to utilize the inherent duality between samples and the density (probability distribution) from which they were generated [23]. This duality offers an opportunity to indirectly recreate a density (that might be hard to compute) from samples that are easy to obtain. Here we give a brief review of the method and illustrate with a toy problem.

Let us consider a target density h(θ). In our applications this target will be the posterior PDF $p r (θ | D)$ from Eq. 1. Instead of attempting to directly collect samples from h(θ), as would be the goal in MCMC approaches, the S/IR method uses a detour. We first obtain samples from a simple (even analytic) density g(θ). We then resample from this finite set using a resampling algorithm to approximately recreate samples from the target density h(θ). There are (at least) two different resampling methods. In this paper we only focus on one of them called weighted bootstrap (more details of resampling methods can be found in Refs. [22, 23]).

Assuming we are interested in the target density h(θ) = f(θ)/∫ f(θ) dθ, the procedure of resampling via weighted bootstrap can be summarized as follows:

1) Generate the set ${θ_{i}}_{i = 1}^{n}$ of samples from a sampling density g(θ).

2) Calculate ω_i = f(θ_i)/ g(θ_i) for the n samples and define importance weights as: $q_{i} = ω_{i} / \sum_{j = 1}^{n} ω_{j}$ .

3) Draw N new samples ${θ_{i}^{*}}_{i = 1}^{N}$ from the discrete distribution ${θ_{i}}_{i = 1}^{n}$ with probability mass q_i on θ_i.

4) The set of samples ${θ_{i}^{*}}_{i = 1}^{N}$ will then be approximately distributed according to the target density h(θ).

Intuitively, the distribution of θ* should be good approximation of h(θ) when n is large enough. Here we justify this claim via the cumulative distribution function of θ* (for the one-dimensional case)

\begin{align} p r (θ^{*} \leq a) & = \sum_{i = 1}^{n} q_{i} \cdot H (a - θ_{i}) = \frac{\frac{1}{n} \sum_{i = 1}^{n} ω_{i} \cdot H (a - θ_{i})}{\frac{1}{n} \sum_{i = 1}^{n} ω_{i}} \\ \underset{\to}{n \to \infty} \frac{E_{g} [\frac{f (θ)}{g (θ)} \cdot H (a - θ)]}{E_{g} [\frac{f (θ)}{g (θ)}]} = \frac{\int_{- \infty}^{a} f (θ) d θ}{\int_{- \infty}^{\infty} f (θ) d θ} = \int_{- \infty}^{a} h (θ) d θ, \end{align} (3)

with $E_{g} [X (θ)] = \int_{- \infty}^{\infty} X (θ) g (θ) d θ$ the expectation value of X(θ) with respect to g(θ), and H Heaviside step function such that

H (a - θ) = \{\begin{cases} 1 & if θ \leq a, \\ 0 & if θ > a . \end{cases} (4)

The above resampling method can be applied to generate samples from the posterior PDF $h (θ) = p r (θ | D)$ in a Bayesian analysis. It remains to choose a sampling distribution, g(θ), which in principle could be any continuous density distribution. However, recall that h(θ) can be expressed in terms of an unnormalized distribution f(θ), and using Bayes’ theorem 1) we can set $f (θ) = L (θ) p r (θ)$ . Thus, choosing the prior pr(θ) as the sampling distribution g(θ) we find that the importance weights are expressed in terms of the likelihood, $q_{i} = L (θ_{i}) / \sum_{j = 1}^{n} L (θ_{j})$ . Assuming that it is simple to collect samples from the prior, the costly operation will be the evaluation of $L (θ_{i})$ . Here we make the side remark that an effective and computationally cost-saving approximation can be made if we manage to perform a pre-screening to identify (and ignore) samples that will give a very small importance weight. We also note that the above choice of g(θ) = pr(θ) is purely for simplicity and one can perform importance resampling with any g(θ).

In Figure 1 we follow the above procedure and give a simple example of S/IR to illustrate how to get samples from a posterior distribution. We consider a two-dimensional parametric model with θ = (θ₁, θ₂). Given data $D$ obtained under the model we have:

p r (θ_{1}, θ_{2} | D) = \frac{L (θ_{1}, θ_{2}) p r (θ_{1}, θ_{2})}{\iint L (θ_{1}, θ_{2}) p r (θ_{1}, θ_{2}) d θ_{1} d θ_{2}} . (5)

For simplicity and illustration, the joint prior distribution for θ₁, θ₂ is set to be uniform over the unit square as shown in Figure 1A. In this example we also assume that the data $D$ follows a multivariate Student-t distribution such that the likelihood function is

L (θ_{1}, θ_{2}) = \frac{Γ [(ν + p) / 2]}{Γ (ν / 2) ν^{p / 2} π^{p / 2} | Σ |^{1 / 2}} {[1 + \frac{1}{ν} {(θ - μ)}^{T} Σ^{- 1} (θ - μ)]}^{- (μ + p) / 2}, (6)

where the dimension p = 2, the degrees of freedom ν = 2, the mean vector μ = (0.2, 0.5) and the scale matrix Σ = [[0.02, 0.005], [0.005, 0.02]].

FIGURE 1

FIGURE 1. Illustration of S/IR procedures. (A) Samples ${θ}_{i = 1}^{n}$ from the uniform prior in a unit square (n = 2000 samples are shown). (B) Histogram of rescaled importance weights ${\tilde{q}}_{i} = q_{i} / \max ({q})$ where $q_{i} = L (θ_{i}) / \sum_{j = 1}^{n} L (θ_{j})$ with $L (θ)$ as in Eq. 6. The number of effective samples is n_eff = 214.6. Note that the samples are drawn from a unit square and that the tail of the target distribution is not covered. (C) Samples ${θ^{*}}_{i = 1}^{N}$ of the posterior (blue dots with 10% opacity) resampled from the prior samples with probability mass q_i. The contour lines for the 68% and 90% credible regions of the posterior samples (blue dashed) are shown and compared with those of the exact bivariate target distribution (green solid). Summary histograms of the marginal distributions for θ₁ and θ₂ are shown in the top and right subplots.

The importance weights q_i are then computed for n = 2000 samples drawn from the prior (these prior samples are shown in Figure 1A). The resulting histogram of importance weights is shown in Figure 1B. Here the weights have been rescaled as ${\tilde{q}}_{i} = q_{i} / \max ({q})$ such that the sample with the largest probability mass corresponds to 1 in the histogram. We also define the effective number of samples, n_eff, as the sum of rescaled importance weights, $n_{eff} = \sum_{i = 1}^{n} {\tilde{q}}_{i}$ . Finally, in Figure 1C we show N = 20, 000 new samples ${θ_{i}^{*}}_{i = 1}^{N}$ that are drawn from the prior samples ${θ_{i}}_{i = 1}^{n}$ according to the probability mass q_i for each θ_i. The blue and green contour lines represent (68% and 90%) credible regions for the resampled distribution and for the Student-t distribution, respectively. This result demonstrates that the samples generated by the S/IR method give a very good approximation of the target posterior distribution.

3 Nuclear physics applications

Now that we have reviewed the basic idea of the S/IR method, we move on to present realistic applications of the resampling technique in nuclear structure calculations. Here we study Bayesian inference involving the ΔNNLO chiral interaction [33] with explicit inclusion of delta isobar degree of freedom at next-to-next-to-leading order. In Weinberg’s power counting the ΔNNLO interaction model is parametrized by 17 LECs, with four pion-nucleon LECs (c_1,2,3,4) that are inferred from pion-nucleon scattering data and 13 additional LECs that should be inferred from extant experimental data of low-energy nucleon-nucleon scattering and bound-state nuclear observables.

For this application we treat only a subset of the parameters as active and keep the other LECs fixed at values taken from the ΔNNLO_GO(450) interaction [34]. Specifically, we consider deuteron observables and use seven active model parameters: c_1,2,3,4, ${\tilde{C}}_{3S1}$ , C_3S1, C_E1. Our Gaussian likelihood contains three data wih independent errors: the deuteron ground state energy E, its point-proton radius R_p and one-body quadrupole moment Q with experimental and theoretical targets from Refs. [35–37]. Note that the target point-proton radii were transformed from experimental charge radii using the same relation as in Ref. [38]. For the target Q we use the theoretical result obtained by the CD-Bonn [37] model. With these simplified conditions, we perform S/IR as well as MCMC sampling to study 1) the posterior PDF for the LECs and 2) posterior predictive distributions (PPDs) for selected few-body observables. This application therefore allows a straightforward comparison of the two different sampling methods in a realistic setting. We note that the inclusion of all 17 LECs as active parameters would have required more careful tuning of the MCMC sampling algorithm and corresponding convergence studies.

It is the computation of observables, e.g., for likelihood evaluations, which is usually the major, time-consuming bottleneck in Bayesian analyses using MCMC methods. In this application, the statistical analysis is enabled by the use of emulators which mimic the outputs of many-body solvers but are faster by orders of magnitude. The emulators employed here for the ground-state observables of the deuteron, and later for few-body observables, are based on eigenvector continuation [39–41]. These emulators allow to reduce the computation time from seconds to milliseconds while keeping the relative error (compared with full no-core shell model calculation) within 0.001%. Unfortunately, emulators are not yet available for all nuclear observables. The MCMC sampling of posterior PDFs, or the evaluation of expectation integrals such as Eq. 2, will typically not work for models with observables that require heavy calculations.

The experimental target values and error assignments for the calibration observables used to condition the posterior PDF are listed in the upper half of Table 1. In this study we assume a normally-distributed likelihood, and consider different sources of error when calibrating the model predictions with experimental data. The errors are assumed to be independent. They include experimental, ɛ_exp, model (χEFT truncation) discrepancy, ɛ_model, many-body method, ɛ_method, and emulator, ɛ_em, errors. The χEFT truncation errors are estimated based on order-by-order calculations as in Ref. [33]. More details on the determination of the error scales can be found in Ref. [42].

TABLE 1

TABLE 1. Target values, z, and error assignments, ɛ, for observables used in the model calibration and for predictions. Energies in [MeV], point-proton radii in [fm], and the deuteron quadrupole moment in [e²fm²].

Furthermore, we take advantage of previous studies and incorporate information about c_1,2,3,4 from a Roy-Steiner analysis of pion-nucleon scattering data [43] and identify a non-implausible domain for ${\tilde{C}}_{3S1}$ , C_3S1, C_E1 from a history matching approach in Ref. [42]. With this prior knowledge we set up the prior distribution of the seven LECs as the product of a multivariate Gaussian for c_1,2,3,4 and a uniform distribution for ${\tilde{C}}_{3S1}$ , C_3S1, C_E1¹. We note that the use of history matching is very beneficial for both S/IR and MCMC sampling. For S/IR it allows to select a sampling distribution that promises a large overlap with the target distribution and it identifies prior samples that are likely to have large weights in the resampling step. For MCMC, the non-implausible samples from history matching serve as good starting points for the walkers and thereby give faster convergence.

3.1 Posterior sampling

Once we have the prior and the likelihood function we are able to draw samples from the posterior PDF and to analyze the ab initio description of few-nucleon systems with the present interaction model. The joint posterior of the LECs is shown in Figure 2, where we compare bivariate, marginal distributions from S/IR and MCMC sampling. For the MCMC sampling we employed an open-source Python toolkit called emcee [44] that performs affine-invariant ensemble sampling. We use 150 walkers that are warmed up with 5,000 initial steps and then move for 5 × 10⁵ steps. This amounts to 7.6 × 10⁷ likelihood evaluations. The positions of the walkers are recorded every 500 steps which gives 1.5 × 10⁵ samples from the posterior distribution of the LECs. On the other hand, for S/IR we first acquire 2 × 10⁴ samples from the prior distribution and perform the same number of likelihood evaluations to get the importance weights. From this limited set we then draw 1.5 × 10⁵ samples using resampling (the same final number as in MCMC). Note that several prior samples occur more than once in the final sample set. Here the number of effective samples for S/IR is n_eff = 1589.9. As we can see from Figure 2, the contour lines of both sampling methods are in good agreement and, e.g., the correlation structure of the LEC pairs are equally well described. The histograms of S/IR and MCMC samples are both plotted in the figure but are almost impossible to distinguish.

FIGURE 2

FIGURE 2. The joint posterior of LECs sampled with S/IR (blue) compared with MCMC sampling (orange). The LECs are shown in units of 10⁴ GeV⁻¹, 10⁴ GeV⁻² and 10⁴ GeV⁻⁴ for c_i, ${\tilde{C}}_{i}$ and C_i, respectively. The likelihood observables and assigned errors are given in Table 1. The contour lines indicate 68% and 90% credible regions.

As a second stage we use the inferred model to perform model checking of the calibration observables and to predict the ³H ground-state energy and the ⁴He ground-state energy and point-proton radius (see Table 1). For this purpose the PPD is defined as the set

\{y_{th} (θ) : θ \sim p r (θ | D)\}, (7)

where y_th(θ) is the theoretical predictions of selected observables using the model parameter vector θ. Figure 3 illustrate the PPD of the three deuteron observables using S/IR (blue) and MCMC sampling (orange). The marginal histograms of the observable predictions are shown in the diagonal panels of the corner plot. In this study both sampling methods give very similar distributions for all observables. Note that the predictive distributions for the three deuteron observables can be considered as model checking since they appeared in the likelihood function and therefore conditioned the LEC posterior. The ³H and ⁴He observables, on the other hand, are predictions in this study. Their distributions are characterized by larger variances compared to the deuteron predictions.

FIGURE 3

FIGURE 3. The PPD obtained from samples of the LECs posterior distribution as shown in Figure 2. The bivariate histograms and the corresponding contour lines denote the joint distribution of observables generate by S/IR (blue) and MCMC sampling (orange). The marginal distributions of the observables are shown in the diagonal panels.

3.2 Posterior probability updates

As mentioned in the introduction, the S/IR method requires a minimum amount of computation to produce new samples when the posterior PDF is updated for various reasons. Here we present one likely scenario where the posterior is changed due to different choices of calibration data (for instance the inclusion of newly-accessible observables). Let us start from the previously described calibration of our interaction model with three selected deuteron observables. If we add ³H and ⁴He observables into the calibration (experimental target values and error assignments as in Table 1) to further condition the model, the likelihood function needs to be updated accordingly. The sampling of the posterior PDF should be repeated from the beginning and the new samples should be used to construct PPDs. However, using S/IR we resample from the same set of prior samples—only with different importance weights. The same set of samples also appear in the sampling of PPDs. To distinguish the original and the updated posteriors we use the notation PPD_A=2 to denote predictions with only deuteron observable as calibration data and PPD_A=2,3,4 with ³H and ⁴He added to the likelihood. These two different PPDs, generated by S/IR, are shown in Figure 4. Note that the PPD_A=2 (blue) is the same as in Figure 3, and is shown here as a benchmark. As expected we observe that the description of ³H and ⁴He observables is more accurate and more precise (smaller variations) with PPD_A=2,3,4 (green) as compared with PPD_A=2 (blue). We also find that the deuteron ground state energy is slightly improved with the updated posterior. This can be explained by the anti-correlation between R_p(⁴He) and E(²H). The additional constraints imposed by R_p(⁴He) through the likelihood function propagates to E(²H) via the correlation structure.

FIGURE 4

FIGURE 4. The posterior predictive distribution from sampling over two different posterior distributions. PPD_A=2 (blue) is calibrated by the deuteron observables while PPD_A=2,3,4 (green) is calibrated by the deuteron, ³H and ⁴He observables. The marginal distributions of the observables are shown in the diagonal panels.

3.3 S/IR limitations

So far we have focused on the feasibility and advantage of the S/IR approach. However, there are some important limitations and we recommend users to be mindful of the number of effective samples. In Figure 4, we found that our S/IR sampling of PPD_A=2 has n_eff = 1589.9, while for PPD_A=2,3,4 it drops to n_eff = 314.9. This can be understood by the resampling from a fixed set of prior samples. The more complex the likelihood function, the less effective the samples. As seen in Figure 4, the contour lines of PPD_A=2,3,4 is less smooth then those obtained from PPD_A=2 due to the smaller number of effective samples. The S/IR method will eventually break when n_eff becomes too small. An intermediate remedy could be the use of kernel density estimators, although that approach typically introduces an undesired sensitivity to the choice of kernel widths.

A similar situation occurs when the target observables are characterized by very small error assignments. This leads to a sharply peaked likelihood function and a decreased overlap with the prior samples. The resulting large variance of importance weights implies that the final set representing the posterior distribution will be dominated by a very small number of samples. Here we show such an example where resampling no longer works. We attempted to reconstruct a PPD with only deuteron observables in the calibration, but where all error assignments in Table 1 had been reduced by an order of magnitude. The results of this analysis are shown in Figures 5, 6 which display the PDFs and PPDs, respectively, generated by S/IR (blue) and compared with MCMC (orange). The S/IR method does not perform well in this case. With n_eff = 4.4 the PDF and PPD generated by S/IR are represented by a few samples. The MCMC sampling, on the other hand, does manage to identify the updated distribution.

FIGURE 5

FIGURE 5. The posterior of LECs sampled with S/IR (blue) compared with MCMC sampling (orange) for a situation when the deuteron calibration observables are associated with errors that have been reduced by an order of magnitude (see text for details). The LECs are shown in units of 10⁴ GeV⁻¹, 10⁴ GeV⁻² and 10⁴ GeV⁻⁴ for c_i, ${\tilde{C}}_{i}$ and C_i, respectively.

FIGURE 6

FIGURE 6. The PPD generated using S/IR (blue) and MCMC sampling (orange) for the posterior distributions shown in Figure 5. Marginal histograms of the observables are shown in the diagonal panels.

Unfortunately one can also envision more difficult scenarios in which S/IR could fail without any clear signatures. For example, if the prior has a very small overlap with the posterior there is a risk that many prior samples get a similar importance weight (such that the number of effective samples is large) but that one has missed the most interesting region. Again, history matching is a very useful tool in the analysis as it can be used to ensure that we are focusing on the LECs domain that covers the mode(s) of the posterior.

4 Summary

In this paper we reviewed an established sampling method known as S/IR. Specifically, we applied importance resampling using the weighted bootstrap algorithm and sampled the posterior PDF for selected LECs of the ΔNNLO interaction model conditioned on deuteron observables. The resulting PDF and PPD were compared with those obtained from MCMC sampling and a very good agreement was found. We also demonstrated Bayesian updating using S/IR by the addition of ³H and ⁴He observables to the calibration data set. As expected, the predictions of ³H and ⁴He observables were improved, but also the description of the deuteron ground-state energy which could be explained by the correlation structure between E(²H) and R_p(⁴He). Finally, we illustrated some limitations of the S/IR method that were signaled by small numbers of effective samples. We found that such situations occured when the likelihood became too complex for the limited model, or when prior samples failed to resolve a very peaked posterior that resulted from small tolerances. We also argued that prior knowledge of the posterior landscape is very useful to avoid possible failure scenarios that might not be signaled by the number of effective samples.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

WJ and CF contributed equally in this paper.

Funding

This work was supported by the European Research Council under the European Unions Horizon 2020 research and innovation program (Grant No. 758027) and the Swedish Research Council (Grant Nos 2017-04234 and 2021-04507). The computations and data handling were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at Chalmers Centre for Computational Science and Engineering (C3SE), and the National Supercomputer Centre (NSC) partially funded by the Swedish Research Council through Grant No. 2018-05973.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

¹Specifically we use the non-implausible domain that was identified in wave 2 of the history matching performed in Ref. [42]. This wave only included deuteron observables.

References

1. Schindler MR, Phillips DR. Bayesian methods for parameter estimation in effective field theories. Ann Phys (N Y) (2009) 324:682–708. doi:10.1016/j.aop.2008.09.003