Induced Seismicity Completeness Analysis for Improved Data Mining

Mignan, Arnaud

doi:10.3389/feart.2021.635193

ORIGINAL RESEARCH article

Front. Earth Sci., 29 March 2021

Sec. Geohazards and Georisks

Volume 9 - 2021 | https://doi.org/10.3389/feart.2021.635193

This article is part of the Research TopicAdvances in Monitoring, Modeling and Managing Induced SeismicityView all 11 articles

Induced Seismicity Completeness Analysis for Improved Data Mining

Arnaud Mignan^1,2*

¹Institute of Risk Analysis, Prediction and Management (Risks-X), Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology (SUSTech), Shenzhen, China
²Department of Earth and Space Sciences, Southern University of Science and Technology (SUSTech), Shenzhen, China

The study of induced seismicity at sites of fluid injection is paramount to assess the seismic response of the earth’s crust and to mitigate the potential seismic risk. However statistical analysis is limited to events above the completeness magnitude $m_{c}$ , which estimation may significantly vary depending on the employed method. To avoid potential biases and optimize the data exploitable for analysis, a better understanding of completeness, detection capacity and censored data characteristics is needed. We apply various methods previously developed for natural seismicity on 16 underground stimulation experiments. We verify that different techniques yield different $m_{c}$ values and we suggest using the 90% quantile of the $m_{c}$ distribution obtained from high-resolution mapping, with $m_{c}$ defined from the mode of local magnitude frequency distributions (MFD). We show that this distribution can be described by an asymmetrical Laplace distribution and the bulk MFD by an asymmetric Laplace mixture model. We obtain an averaged Gutenberg-Richter parameter $b = 1.03 \pm 0.48$ and a detection parameter $k = 3.18 \pm 1.97$ from mapping, with values subject to high uncertainties across stimulations. We transfer Bayesian $m_{c}$ mapping developed for natural seismicity to the context of induced seismicity, here adapted to local three-dimensional seismicity clouds. We obtain the new prior parameterization $m_{c, p r e d} = 1.64 l o g_{10} (d_{3}) - 1.83$ , with $d_{3}$ the distance to the 3rd nearest seismic station. The potential use of censored data and of $m_{c}$ prediction is finally discussed in terms of data mining to improve the monitoring, modeling and managing of induced seismicity.

Introduction

The evaluation of the completeness magnitude $m_{c}$ , above which the Gutenberg-Richter law is verified as all the data is by definition observed, is a prerequisite to virtually all statistical analyses of seismicity. This includes the study of induced seismicity at sites of underground stimulation by fluid injection. Underestimating $m_{c}$ yields to biased estimates of the slope of the Gutenberg-Richter law, the $b$ -value, and overestimating it may lead to unnecessary under-sampling. Selection of $m_{c}$ has therefore indirect consequences on seismic hazard assessment. Most published works provide an estimate of $m_{c}$ but rarely explain how it has been calculated and rarely, if ever, provide a sensitivity analysis.

The present study aims at filling this gap by an in-depth analysis of the magnitude frequency distribution (MFD) at multiple sites. To the best of our knowledge, this is the first study dedicated to completeness magnitude analysis in the induced seismicity context. We will test different $m_{c}$ techniques (e.g., Wiemer and Wyss, 2000; Amorèse, 2007) and transfer two recent models originally developed by the author for natural seismicity: The Asymmetric Laplace MFD model to describe the incomplete part of seismicity (Mignan, 2012; Mignan, 2019), and the Bayesian Magnitude of Completeness (BMC) method for robust $m_{c}$ mapping (Mignan et al., 2011), based on $m_{c}$ being the mode of local MFDs, in agreement with the Asymmetric Laplace formulation.

The BMC method has been successfully applied in various regions of the world, but so far only in the context of natural seismicity: Taiwan (Mignan et al., 2011), Mainland China (Mignan et al., 2013), Switzerland (Kraft et al., 2013), Lesser Antilles arc (Vorobieva et al., 2013), California (Tormann et al., 2014), Greece (Mignan and Chouliaras, 2014), Iceland (Panzera et al., 2017), South Africa (Brandt, 2019) and Venezuela (Vásquez and Bravo de Guenni, n.d.)¹. It becomes urgent to apply it to induced seismicity, which requires a reformulation of the model. Based on the new parameterization and additional information on incomplete (so-called censored) data, we will discuss how such information could improve induced seismicity data mining, or in other words, how it could improve knowledge on the underground feedback activation and the management of the associated risk.

Materials and Methods

Induced Seismicity Data

We consider 16 underground stimulations by deep fluid injection (Table 1), all of which are publicly available and often available from dedicated data portals (e.g., EOST and GEIE EMC, IS EPOS): the Soultz-sous-Forêts stimulations at the GPK1 well in 1993 [S93] (Cornet et al., 1997), GPK2 well in 2000 [S00] (Cuenot et al., 2008), GPK3 well in 2003 [S03] (Calò and Dorbath, 2013) and GPK4 well in both 2004 [S04] and 2005 [S05] (Charléty et al., 2007), the KTB deep drilling site [KTB94] (Jost et al., 1998), the Paradox Valley continuous injection from 1994 to 2008 [PV94] (Ake et al., 2005), the 2006 Basel 1 well stimulation [B06] (Häring et al., 2008; Kraft and Deichmann, 2014), the 2007–2014 Geysers [G07] Prati-9 and Prati-29 well injections (Kwiatek et al., 2015), the 2008 Groß Schönebeck injection [GS07] (Kwiatek et al., 2010), the Cooper Basin Habanero 4 well stimulation of 2012 [CB12] (Baisch et al., 2015). the Newberry Volcano EGS demonstration 2012 stimulation and 2014 restimulation [NB12] (Cladouhos et al., 2013; Cladouhos et al., 2015), the 2013 St Gallen reservoir simulation [SG13] (Diehl et al., 2017), the 2015 Äspö Hard Rock Laboratory experiment [A15] (Kwiatek et al., 2018), the 2016–2017 Pohang stimulation experiment [P16] (Woo et al., 2019), and the 2018 Espoo stimulation [E18] near Helsinki (Kwiatek, 2019). Most stimulations considered took place at EGS sites.

TABLE 1

TABLE 1. Available public data for induced seismicity completeness analysis.

Depending on the parameters provided (see Table 1), different completeness analysis levels are achievable. When earthquake coordinates are not included, the study is limited to the bulk MFD analysis (Woessner and Wiemer, 2005; Mignan and Woessner, 2012) and to the application of the Asymmetric Laplace distribution (Mignan, 2019); when earthquake coordinates are included, observed completeness magnitude $m_{c, o b s}$ mapping is performed (e.g., Wiemer and Wyss, 2000). In the ideal situation in which the coordinates of the seismic stations are also given, posterior completeness magnitude $m_{c, p o s t}$ maps are also generated using the BMC method (Mignan et al., 2011).

Since this study is solely dedicated to seismicity completeness, data such as total volume injected, flow rate profile, or injection/post-injection windows are not considered (only mentioned in the discussion, Discussion and Perspectives on Data Mining). For statistical analyses related to the fluid injection process at different sites, the reader can refer to, e.g., Dinske and Shapiro (2013), van der Elst et al. (2016), Mignan et al. (2017) or Bentz et al. (2020).

Standard Magnitude Frequency Distribution Analysis

The bulk magnitude frequency distribution (MFD) of an earthquake catalog can be described by a probability density function that takes the form:

p (m) = c q (m) f_{G R} (m) = c q (m) e^{- β m}

where $m$ is the earthquake magnitude, $f_{G R} (m)$ the Gutenberg-Richter law (Gutenberg and Richter, 1944), $q (m)$ a detection function that controls the shape of the MFD and $c$ a normalization constant so that $\int^{} p (x) d x = 1$ . The non-cumulative MFD, defined as the number of earthquakes per magnitude bin $m$ , is simply $n (m) = Δ m N_{t o t} p (m)$ with $N_{t o t}$ the total number of events and $Δ m = 0.1$ the magnitude bin. The cumulative MFD is more commonly formulated as $N (\geq m) = 10^{a - b m}$ where $b = β / l o g 10$ and $a$ is the overall seismicity activity.

We should have the condition $q (\geq m_{c}) = 1$ by definition, although it may only tend to 1 if $q$ is unbounded, for example if defined as a cumulative Normal distribution (Ringdal, 1975; Ogata and Katsura, 1993), a log-normal distribution (Martinsson and Jonsson, 2018), or a power-law (so that $p$ can be represented by a gamma distribution; Kijko and Smith, 2017). Mignan (2012), Mignan (2019), in contrast, consider the gradual curvature of the MFD to be due to the sum of "angular" MFDs, each of constant $m_{c}$ , with $q$ a bounded exponential function and $p$ an asymmetric Laplace distribution (see below). "Curved" $q$ functions would then be fitting proxies not representative of the spatially varying and scale-variant detection process (Mignan and Chen, 2016).

Various methods have been proposed to estimate $m_{c}$ from the bulk MFD, independently of the function $q (m)$ (see reviews by Woessner and Wiemer, 2005; Mignan and Woessner, 2012). We here consider two non-parametric techniques, the mode of the MFD (also known as "maximum curvature" method; Wiemer and Wyss, 2000) and the Median-Based Analysis of the Segment Slope (MBASS) method (Amorèse, 2007). The $b$ -value of the Gutenberg-Richter law can then be estimated with the maximum likelihood estimation (MLE) method (Aki, 1965) for the complete magnitude range $(m_{c} - Δ m / 2, + \infty)$ . It is important to note that $m_{c}$ values obtained from the bulk MFD can vary significantly across methods, which hampers the evaluation of $b$ . A spatial analysis can limit the potential ambiguity (Mignan and Chouliaras, 2014).

Spatial heterogeneities in $m_{c}$ , due at first order to the seismic network spatial configuration (Bayesian Magnitude of Completeness Mapping Method), can be evaluated by a simple mapping procedure. We perform a three-dimensional mapping of $m_{c, o b s} (x, y, z)$ in cubic cells 100-m wide. No smoothing kernel (e.g., Wiemer and Wyss, 2000) is used in order to minimize $m_{c}$ heterogeneities in individual cells. The parameter is estimated by using the mode of the distribution of magnitudes $m$ in each cell $(x, y, z)$ . The mode is a reasonable choice for localized data where no significant spatial variations in $m_{c}$ is expected (Mignan, 2012; Mignan and Chen, 2016). It also yields robust results for sample sizes as low as $n_{m i n} = 4$ earthquakes (Mignan et al., 2011), the threshold used in the present mapping procedure. The set of $m_{c}$ estimates for all cells is then represented by the vector $m_{c, o b s}$ , which distribution explains the curvature characteristics of the bulk MFD. Different quantiles of $m_{c, o b s}$ can be tested to evaluate $b$ . The map of $m_{c, o b s} (x, y, z)$ is also used as input for the BMC method described in Bayesian Magnitude of Completeness Mapping Method.

Local MFDs of cells $(x, y, z)$ can be used to estimate both $b$ and the parameter $k$ of the detection function $q$ . We consider the asymmetric Laplace probability density function:

p_{A L} (m) = \frac{1}{\frac{1}{κ - β} + \frac{1}{β}} {\begin{matrix} e^{(κ - β) (m - m_{c})} & , m < m_{c} \\ e^{- β (m - m_{c})} & , m \geq m_{c} \end{matrix}

with mode $m_{c}$ and the detection parameter $k = κ / l o g (10)$ also estimated using the MLE method (Mignan, 2012; see also equation Asymmetric Laplace Mixture Model). This parameter has been shown to be relatively stable with $k \approx 3$ for natural seismicity in Southern California and Nevada (Mignan, 2012). We apply the same approach to test how this parameter behaves in the context of induced seismicity. We only consider cells with $n_{m i n} = 50$ for those calculations. The asymmetric Laplace distribution is the basic component of the mixture model presented below (Asymmetric Laplace Mixture Model). It also explains why the mode is used to compute $m_{c}$ in the BMC method (Bayesian Magnitude of Completeness Mapping Method).

Asymmetric Laplace Mixture Model

The sum of local “angular” MFDs of different $m_{c}$ which forms the bulk "curved" MFD can be approximated by mixture modeling instead of a mapping procedure. This is particularly practical if earthquake coordinates are unavailable with only access to a magnitude vector. The Asymmetric Laplace Mixture Model (ALMM) (Mignan, 2019) is defined as:

p_{A L M M} (m; w_{i}, m_{c, i}, κ, β) = \sum_{i = 1}^{K} w_{i} p_{A L} (m; m_{c, i}, κ, β)

with $K$ the number of Asymmetric Laplace mixture components ordered by $m_{c}$ value $(m_{c, 1} < m_{c, 2} < \dots < m_{c, i} < \dots < m_{c, K})$ and $w_{i}$ the mixing weight of the $i$ ^th component such that $\sum_{i = 1}^{K} w_{i} = 1$ . Parameters $κ$ and $β$ are assumed constant across components.

Any MFD shape can be fitted by the flexible ALMM based on the Expectation-Maximization (EM) algorithm (Dempster et al., 1977). The initial parameter values are estimated by applying $K$ -means (MacQueen, 1967), with $w_{i}$ the normalized number of events per cluster and $m_{c, i}$ the cluster centroid. Each component is formed of the magnitude vector $M_{i} = (m_{1}, m_{2}, \dots)$ . The completeness magnitude $m_{c, i}$ is estimated from the mode of the component. Parameter $κ$ is estimated from the incomplete part of the first component $M_{l e f t} = {m \in M_{1} : m \leq m_{c, 1} - Δ m / 2}$ while parameter $β$ is estimated from the complete part of the last component $M_{r i g h t} = {m \in M_{K} : m > m_{c, K} - Δ m / 2}$ . The maximum likelihood estimates are respectively:

{\begin{matrix} χ = \frac{1}{(m_{c, 1} - \frac{Δ m}{2}) - {\bar{M}}_{l e f t}} \\ β = \frac{1}{{\bar{M}}_{r i g h t} - (m_{c, K} - \frac{Δ m}{2})} \end{matrix}

with $χ = κ - β$ the slope of the incomplete part of the asymmetric Laplace distribution in a log-linear plot.

At each EM iteration $j$ , a deterministic version of the expectation step (E-step) attributes a hard label $i$ to each event magnitude from the parameter set $θ_{i}^{(j - 1)} = {m_{c, i}, κ, β}$ defined in the previous iteration $j - 1$ ( $j = 0$ corresponding to the $K$ -means estimate). Hard labels are assigned as:

i = a r g m a x_{i} p_{A L} (m, θ_{i}^{(j - 1)})

The maximization step (M-step) then updates the component parameters. The best number of components $K$ is finally selected from the lowest Bayesian Information Criterion estimate $B I C = - L L + 1 / 2 (2 + K) l n (N_{t o t})$ (Schwarz, 1978) where $L L$ is the log-likelihood of the ALMM. Details of the full method are given in Mignan (2019). For MFD mixture modeling based on a log-normal component, the reader may refer to Martinsson and Jonsson (2018).

Bayesian Magnitude of Completeness Mapping Method

The last method to be tested in the present study is the Bayesian Magnitude of Completeness (BMC) method that consists in using Bayesian inference to estimate $m_{c}$ based on incomplete information and prior belief. The incomplete information is the $m_{c, o b s}$ map (see Standard Magnitude Frequency Distribution Analysis), which presents gaps in cells of low seismicity and is highly uncertain when estimated from a limited number of earthquake magnitudes. BMC is constrained by a prior model $m_{c, p r e d} = f (d_{k})$ relating the spatial heterogeneities in $m_{c}$ to the density of seismic stations, approximated by the distance $d_{k}$ to the $k$ ^th nearest station (Mignan et al., 2011). Priors were defined so far in the literature for two-dimensional $m_{c}$ mapping. We here define a new prior based on three-dimensional distance, which is a requirement for fluid injections characterized by a three-dimensional seismicity cloud centered at the borehole depth and detected by a combination of surface stations and downhole stations. The distance between a cell and a station of coordinates $(x_{s t a}, y_{s t a}, z_{s t a})$ is thus $d = \sqrt{{(x - x_{s t a})}^{2} + {(y - y_{s t a})}^{2} + {(z - z_{s t a})}^{2}}$ . We additionally improve the functional form of the prior, moving from $m_{c, p r e d} = c_{1} d_{k}^{c_{2}} + c_{3}$ (Mignan et al., 2011) to the form $m_{c, p r e d} = c_{1} l o g_{10} (d_{k}) + c_{2}$ , a simpler attenuation function reduced to two free parameters.

Following Bayes' Theorem, we obtain the posterior completeness magnitude $m_{c, p o s t}$ and standard deviation $σ_{p o s t}$ :

{\begin{matrix} m_{c, p o s t} = \frac{m_{c, p r e d} σ_{o b s}^{2} + m_{c, o b s} σ_{p r e d}^{2}}{σ_{p r e d}^{2} + σ_{o b s}^{2}} \\ σ_{p o s t} = \sqrt{\frac{σ_{p r e d}^{2} σ_{o b s}^{2}}{σ_{p r e d}^{2} + σ_{o b s}^{2}}} \end{matrix}

where $σ_{o b s}$ and $σ_{p r e d}$ are the standard deviations of $m_{c}$ observations (based on 100 bootstraps) and of the prior model, respectively. Note that all the aforementioned parameters depend on location $(x, y, z)$ , except for $σ_{p r e d}$ which is constant.

Results

Results of a Standard $m_{c}$ Analysis

We first apply the standard methods of $m_{c}$ evaluation, based on bulk MFD analysis and $m_{c}$ mapping. This is the first systematic comparison of completeness level for different induced seismicity sequences.

Figure 1 shows the cumulative bulk MFD for the 16 fluid injections and the matching $m_{c, o b s}$ distribution. Figure 1 also shows the estimates $m_{c, m o d e}$ (dotted vertical line) and $m_{c, M B A S S}$ (dashed vertical line) which are often close to the $m_{c, o b s}$ median. More conservative estimates of $m_{c}$ , such as the 75% or 90% quantiles of $m_{c, o b s}$ seem to provide reasonable $b$ -values. We use $q_{90} (m_{c, o b s})$ to estimate the Gutenberg-Richter slope $b$ in Figure 1. Note that the $m_{c, o b s}$ distribution shape matches the curvature of the bulk MFD, which verifies that it is due at first order to spatial heterogeneities. Table 2 lists $m_{c, b u l k}$ estimates obtained from different approaches with their respective $b$ -values for comparison. For most cases, the $m_{c}$ range for induced seismicity is comprised between -2 and 1. It goes down to -4 for the Äspö Hard Rock Laboratory experiment where pico-seismicity is detected. Such low $m_{c}$ values have been reached at other underground laboratories (e.g., Villiger et al., 2020). The range of $b$ -values is consistent with the ones obtained by Dinske and Shapiro (2013) for the 5 datasets common to both studies. The authors however only provided one estimate while our Table 2 shows its sensitivity to the minimum magnitude cutoff.

FIGURE 1

FIGURE 1. Cumulative magnitude frequency distribution (MFD) of 16 underground stimulations. The histogram shows the $m_{c, o b s}$ distribution derived from three-dimensional $m_{c}$ mapping (except for KTB94 for which case coordinates are unavailable). Parameter $b$ (dashed red line) is estimated for $m_{c} = q_{90} (m_{c, o b s})$ (for $m_{c} = m_{c, M B A S S}$ in the KTB94 case). The vertical dotted and dashed dark-red lines represent $m_{c, m o d e}$ and $m_{c, M B A S S}$ , respectively. See Table 2 for values.

TABLE 2

TABLE 2. Parameters $m_{c}$ and $b (m_{c})$ for different $m_{c}$ estimation methods applied to the bulk MFD.

Figure 2 shows $m_{c, o b s}$ maps at selected depths $z$ for the two stimulations the richest in induced seismicity ( $N_{t o t} > 10,000$ ): S93 and CB12. Other maps will be shown in Bayesian Magnitude of Completeness Prior & Posterior m_c Maps when used as input for BMC mapping. Local MFDs for cells that include more than 400 earthquakes are also displayed with their asymmetric Laplace distribution fit. Considering all cells of all sites together, assuming that $k$ and $b$ variations in space are random (Mignan, 2012; Kamer and Hiemer, 2015), we obtain for induced seismicity $k = 3.18 \pm 1.97$ and $b = 1.03 \pm 0.48$ , which is consistent with natural seismicity regimes but here with significantly larger uncertainties. The plots of Figures 2B,D confirm that the mode of the local MFD is a reasonable choice to estimate $m_{c}$ .

FIGURE 2

FIGURE 2. Examples of 100 m-resolution $m_{c, o b s}$ maps and of local MFDs. (A) $m_{c, o b s}$ map at depth $z = - 2.7$ km for the 1993 Soultz-sous-Forêts stimulation (S93) (B) local MFD observed in the cell highlighted on the S93 map, with Asymmetric Laplace distribution fit (C) $m_{c, o b s}$ map at depth $z = - 4.1$ km for the 2012 Cooper Basin stimulation (CB12) (D) local MFD observed in the cell highlighted on the CB12 map, with Asymmetric Laplace distribution fit.

This so-called standard $m_{c}$ analysis highlights the importance to test several techniques to minimize possible biases in the $b$ -value. Mapping remains the best approach to evaluate the $m_{c}$ range. Reasonable $b$ -values are obtained when using conservative $m_{c, o b s}$ quantiles (e.g., 75% or 90%).

Asymmetric Laplace Mixture Model Fits

We then apply the ALMM to the 16 magnitude vectors but only get reasonable fits for 9 of them. We find that the ALMM requires $n_{m i n} > 300$ for statistically significant component modeling. It means that the ALMM is not applicable for KTB94, GS08, A15 and P17. It fails for 3 other cases, S00, S03, B06, due to anomalous fluctuations in the observed non-cumulative MFD, which will be discussed in another paragraph.

Figure 3 shows the 9 ALMM fits (for S93, PV94, S04, S05, G07, CB12, NB12, SG13 and E18). Parameters $m a x (m_{c, i})$ and $b$ are listed in Table 2 for comparison with the techniques tested in Results of a Standardm_cAnalysis. Those values range between estimates obtained with the MBASS method and $q_{75} (m_{c, o b s})$ so the method does not seem to provide any new insight into which method to prefer. We observe that the number of $K$ components reflects the gradual curvature of the bulk MFD. For instance, only 2 components suffice to fit the almost angular SG13 MFD while 13 components are needed for the wide S05 MFD, proving the flexibility of the ALMM to fit different MFD shapes. It also verifies that bulk MFDs can be described by the sum of angular MFDs with $m_{c}$ as modes. We obtain $k = {7.6, 3.3, 2.1, 2.8, 6.5, 9.1, 4.5, 3.7, 3.1}$ , respectively, with median 3.7 and mean 4.7.

FIGURE 3

FIGURE 3. Non-cumulative MFD (in blue) of 9 underground stimulations for which an Asymmetric Laplace Mixture Model (ALMM) fit is available, shown in red, with the mixture components shown in orange. See Table 2 for some values.

The ALMM is highly sensible to abnormal fluctuations in the non-cumulative MFD, which are often not visible from the cumulative MFD. In the case of Soultz-sous-Fôrets, the S00 non-cumulative MFD shows significant drops in the number of events inconsistent with any model monotonously increasing up to $m_{c}$ and monotonously decreasing above $m_{c}$ . In the latter, we observe $n_{i} = {0,7,7}$ for bins $m_{i} = {0.2, 0.4, 0.6}$ ; for comparison, $n_{i} = {1468,1114,717,430}$ for $m_{i} = {0.1, 0.3, 0.5, 0.7}$ . Such anomaly is smoothed out in the cumulative MFD and does not hamper $b$ -value fitting. However, the ALMM anchors at those anomalies, failing to develop into the proper curved MFD. The S03 case shows numerous fluctuations also visible on the cumulative MFD and on the non-trivial evolution of $b$ estimates as the minimum magnitude cutoff increases (Table 2). In regards of the Basel catalog, a zig-zag pattern is observed on the non-cumulative MFD, suggesting an error in rounding between odd and even magnitude digits, which confuses the ALMM algorithm. Those cases indicate more problems with the magnitude vectors than with the ALMM. This suggests that seismologists preparing earthquake catalogs should analyze the non-cumulative distribution of magnitudes to check for potential errors and/or explain the origin of those anomalies incompatible with the Gutenberg-Richter law.

Bayesian Magnitude of Completeness Prior and Posterior $m_{c}$ Mmaps

We define a BMC prior model for induced seismicity by combining the relation between $m_{c}$ and the distance $d_{3}$ to the 3rd nearest seismic station, observed for the earthquake catalogs that come with seismic network information (Table 1). We choose $d_{3}$ (over e.g. $d_{4}$ or $d_{5}$ ) since this metric shows the minimal residual error (see $σ_{p r e d}$ below). We assume that $m = M_{L} = M_{w}$ so that seismicity clouds from different depth levels can be combined to fit one model constrained on a relatively wide $d_{3}$ range.

Figure 4 represents the BMC prior derived from 7 datasets: S93, S04, S05, GS08, CB12, SG13, and P16. The model, represented by the solid curve, is defined as

m_{c, p r e d} = f_{p r i o r} (d_{3}) = 1.64 l o g_{10} (d_{3}) - 1.83; σ_{p r e d} = 0.37

with distance $d_{3}$ in km. Note that the uncertainty $σ_{p r e d}$ is greater than the ones obtained from natural seismicity ( $σ_{p r e d} ≲ 0.25$ ; e.g., Mignan et al., 2011; Mignan et al., 2013; Kraft et al., 2013; Mignan and Chouliaras, 2014; Tormann et al., 2014). Several reasons may be advanced: different sites are here combined, representative of different soil conditions and thus potentially of different seismic attenuation functions; considering the depth component may add uncertainty on distance measures; finally, the model is constrained on far shorter distance ( $d_{3} < 10$ km) compared to up to hundreds of kilometers in regional catalogs. It is interesting to compare the model prediction to the pico-seismicity completeness level $m_{c} \approx - 4$ observed at Äspö (A15). We learn from Kwiatek et al. (2018) that sensors were located between a few meters and 100 m from the injection borehole. We independently predict $m_{c, p r e d} (10 m) = - 5.1$ and $m_{c, p r e d} (100 m) = - 3.5$ , which is a reasonable approximation. Adding further datasets to the model will help better constraining it.

FIGURE 4

FIGURE 4. Prior model $m_{c, p r e d} = f (d_{3})$ of the Bayesian Magnitude of Completeness (BMC) method for the three-dimensional induced seismicity case with distance $d_{3}$ to the third nearest seismic station.

Two datasets, S00 and S03, were not included in this analysis as event declaration depended in those cases on two triggering conditions from both the downhole and surface networks (EOST and GEIEEMC, 2018a; EOST and GEIEEMC, 2018b), which is likely inconsistent with the simple $d_{3}$ metric. Testing with $d_{3}$ leads to a systematic bias requiring a correction $f_{p r i o r} (d_{3}) + 1$ . Use of such formula would however be inadequate. It remains unclear if the magnitude scale used for S00 and S03, duration magnitude $m_{D}$ , could also play a role in the observed $m_{c}$ shift upward.

We then combine the $m_{c, o b s}$ with the prior model to derive the posterior $m_{c, p o s t}$ maps. We show some examples taken from S93 and CB12 in Figure 5. The BMC methods fills all the gaps in $m_{c, o b s}$ , and provides completeness levels expected for future seismicity, e.g., during cloud development as more fluids get injected, which can be of use to the injection operators. The BMC method also decreases $m_{c}$ uncertainties, as can be observed when comparing $σ_{p o s t}$ to $σ_{o b s}$ . Note finally that the BMC method is consistent with the Asymmetric Laplace detection model previously described. It makes use of the mode of local MFDs so that the number of cells with $m_{c, o b s}$ values is maximized while $f_{p r i o r}$ explains how $m_{c}$ evolves in space, from which the bulk FMD, approximated by the ALMM, emerges.

FIGURE 5

FIGURE 5. Observed $m_{c, o b s}$ vs. posterior $m_{c, p o s t}$ maps derived from the BMC prior model. (A) S93 at depth $z = - 3.1$ km (B) CB12 at depth $z = - 4.2$ km.

Discussion and Perspectives on Data Mining

We reviewed some standard approaches to estimate the completeness magnitude $m_{c}$ and ported the recent ALMM mixture and BMC mapping methods to the induced seismicity context. We provided various estimates of $m_{c}$ , $b$ (Table 2) and detection parameter $k$ so that better informed choices could be made in future statistical analyses of induced seismicity. We observed that the $k$ -value for induced seismicity is compatible with the one obtained for natural seismicity, suggesting a common detection process although uncertainties are high. We also provided the first parameterization of the BMC prior for three-dimensional seismicity clouds.

The present study could help refine future seismic hazard analyses, since the parameter $m_{c}$ is a prerequisite to the estimation of the hazard inputs: the $a$ - and $b$ -values of the Gutenberg-Richter law. In contrast to the tectonic regime, the $a$ -value is normalized to the total injected volume $V$ for comparisons across stimulations, so that $N (\geq m) = V 10^{a_{f b} - b m}$ with $a_{f b}$ the normalized $a$ -value, called underground feedback parameter in Mignan et al. (2017) and seismogenic index in poroelasticity parlance (e.g., Dinske and Shapiro, 2013). The term $a_{f b}$ is agnostic, while alternatives to poroelasticity exist (e.g., Mignan, 2016). A priori knowledge of the Gutenberg-Richter parameters is required in pre-stimulation risk assessment (e.g., Mignan et al., 2015; Broccardo et al., 2020), and the parameterization may be updated during stimulations via a dynamic traffic light system (e.g., Broccardo et al., 2017; Mignan et al., 2017). Note also that the maximum magnitude $m_{m a s}$ relates directly to $b$ (e.g., van der Elst et al., 2016; Broccardo et al., 2017).

We first showed the impact of $m_{c}$ values on $b$ and selected $q_{90} (m_{c, o b s})$ as conservative estimates. We also found that the ALMM does not provide any new insight to the problem and is hampered by fluctuations in the non-cumulative MFD observed in some experiments. As a consequence, $m_{c}$ mapping remains the best alternative and is simple enough to implement.

While $m_{c}$ also alters $a_{f b}$ via $b$ , we can consider another aspect that may improve our knowledge of the underground feedback. It has been observed that $a_{f b}$ significantly varies across sites and across stimulations at a same site (e.g., Dinske and Shapiro, 2013; Mignan et al., 2017) which may lead to risk aversion of potential investors in geo-energy for instance (Mignan et al., 2019). One may difficultly infer $a_{f b}$ from the literature when no information about completeness is given, which is especially true for early articles. However, we can now estimate $a_{f b}$ despite the total number of events induced $N (\geq m_{?})$ being potentially ambiguous. To illustrate the problem posed, let us consider the 1988 stimulation at Hijiori, Japan. We learn from Sasaki (1998) that $N (\geq m_{?}) = 65$ micro-earthquakes were observed above $m_{?} = - 4$ (their Figure 6) for an injected volume $V = 2,000$ m³. The equation $a_{f b} = l o g_{10} (N (\geq m_{?}) / V) + b m_{?}$ is valid only if $m_{?} \geq m_{c}$ . Information in Sasaki (1998) is however ambiguous, and we may have $m_{?} = m i n (m) < m_{c}$ instead, which would underestimate the underground feedback activation since the data would then be incomplete. Considering all datasets of Table 1 with $N_{t o t} > 200$ , we can estimate from their censored data the metrics $δ m = m_{c} - m i n (m)$ and $γ = N (\geq m_{c}) / N (\geq m i n (m))$ which range on the intervals $[0.8, 1.9]$ and $[0.20, 0.37]$ , respectively (with no trend observed). The distributions are shown in Figure 6A alongside the corrected underground feedback parameter $a_{f b, c o r r e c t e d} = l o g_{10} (γ N (\geq m i n (m)) / V) + b (m i n (m) + δ m)$ . Assuming $δ m$ and $γ$ representative (and $b = 1$ , see Results of a Standardm_cAnalysis), the 1988 Hijiori underground feedback activation may be $a_{f b, c o r r e c t} = - 5.5$ if $m_{?} = m_{c}$ , or $a_{f b, c o r r e c t e d} = [- 5.4, - 3.9]$ if $m_{?} = m i n (m)$ . Despite the ambiguity, an estimate may therefore still be provided. A review of the literature could provide additional values from other fluid injections to better constrain the range of $a_{f b}$ to be considered as a priori information in risk assessment, which is so far potentially biased toward high $a_{f b}$ values (e.g., Mignan et al., 2017).

FIGURE 6

FIGURE 6. Induced seismicity data mining potential from completeness analysis. (A) Estimating the underground feedback parameter $a_{f b}$ despite potential ambiguity on the minimum magnitude cutoff mentioned in the literature, by using information on $δ m = m_{c} - m i n (m)$ and $γ = N (\geq m_{c}) / N (\geq m i n (m))$ obtained for the sites considered in the present study (histograms) – $a_{f b}$ estimates given for $b = 1$ , $N (\geq m_{?}) = 65$ and $V = 2,000$ m³ (1988 Hijiori case) (B) Predicting the completeness level of a planned seismic network configuration using the prior model $f_{p r i o r} (d_{3})$ of the BMC method (here with 8 stations, 7 randomly distributed at the surface and 1 located at the ad-hoc borehole with coordinates $(5,5, - 6)$ km).

Finally, if the BMC method allows defining robust $m_{c}$ maps (no spatial gap, uncertainty constrained by the seismic network configuration), BMC may be even more useful for seismic network planification (e.g., Kraft et al., 2013) prior to new stimulations. Seismic safety criteria can be mapped into magnitude thresholds not to be crossed (Mignan et al., 2017), which tell us the completeness magnitude level required for sound statistical analysis. One can then use the BMC prior $f_{p r i o r} (d_{3})$ to test how a completeness level can be achieved given a seismic network configuration. Figure 6B illustrates such an application. The two approaches presented in Figure 6 demonstrate how induced seismicity data mining can be done from completeness magnitude knowledge, which in turn can improve induced seismicity monitoring, modeling and managing.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here See Table 1 and reference list.

Author Contributions

AM did all the research and writing.

Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Vásquez, R., and Bravo de Guenni, L. n. d. Bayesian estimation of the spatial variation of the completeness magnitude for the Venezuelan seismic catalogue. Available at: https://www.statistics.gov.hk/wsc/CPS204-P47-S.pdf. (Accessed Aug 2014)

References

Ake, J., Mahrer, K., O’Connell, D., and Block, L. (2005). Deep-injection and closely monitored induced seismicity at Paradox Valley, Colorado. Bull. Seismol. Soc. Am. 95, 664–683. doi:10.1785/0120040072

Induced Seismicity Completeness Analysis for Improved Data Mining

Introduction

Materials and Methods

Induced Seismicity Data

Standard Magnitude Frequency Distribution Analysis

Asymmetric Laplace Mixture Model

Bayesian Magnitude of Completeness Mapping Method

Results

Results of a Standard mc Analysis

Asymmetric Laplace Mixture Model Fits

Bayesian Magnitude of Completeness Prior and Posterior mc Mmaps

Discussion and Perspectives on Data Mining

Data Availability Statement

Author Contributions

Conflict of Interest

Footnotes

References

Results of a Standard $m_{c}$ Analysis

Bayesian Magnitude of Completeness Prior and Posterior $m_{c}$ Mmaps