- Institut National de la Recherche Scientifique, Eau Terre Environnement Centre, Quebec City, QC, Canada
Groundwater is essential for drinking water and economic development, yet its availability and quality are threatened by climate change, pollution, and rising demand. Effective groundwater management relies on accurate numerical models for flow and contaminant transport. Traditional calibration techniques often struggle with the uncertainty and spatial variability inherent in hydrogeological data. Although geostatistical simulations can represent this variability, their computational complexity limits their use in large-scale models. To overcome these challenges, ensemble methods like the Ensemble Kalman Filter (EnKF) and Ensemble Smoother (ES) have been introduced for model updates using spatiotemporal data. However, they face limitations in high-dimensional systems with sparse observational data, common in hydrogeology. This paper introduces an innovative data assimilation method combining Well-by-Well (WbW) and observation Type-by-observation Type (TbT) techniques. This approach utilizes local analysis to effectively calibrate large, complex groundwater models with limited observations, resulting in a more stable and accurate calibration process. The method is tested on a synthetic 3D model and a real regional groundwater flow model, showing significant improvements in calibration and predictions. A 3D synthetic model of a coastal aquifer with saltwater intrusion was developed to evaluate the WbW & TbT updates within the Ensemble Smoother with Multiple Data Assimilation (ES-MDA 4x) method. The results indicate improved calibration and reduced errors in hydraulic head and salt concentration predictions. This study demonstrates the robustness of the WbW & TbT method in calibrating the Ville Mercier regional hydrogeological model, showcasing its potential for complex hydrogeological settings. By updating parameters locally around each observation well, the WbW & TbT method addresses high-dimensional challenges while preserving data amplitude and managing the complexity of regional hydrogeological systems. Results confirm that this method enhances the accuracy and reliability of groundwater flow models, making it a vital tool for resource management amid environmental challenges.
1 Introduction
Groundwater plays a primordial role to sustain access to safe drinking water and support economic development (Gleeson et al., 2020). However, this resource is in continuous threat due to climate changes, pollution and increasing demand (Kummu et al., 2016). Climate change alters precipitation patterns, leading to changes in recharge rates, groundwater storage and groundwater flow, while pollution from agricultural, industrial and urban source can contaminate aquifers, making water unsafe for consumption. Increasing water demand due to population growth and economic development intensifies the stress put on groundwater systems. Achieving reliable forecasts of hydrogeological processes is crucial to better manage the groundwater resources and predict the availability and quality of groundwater resources, while considering the different scenarios possible in the face of climate change, pollution and increasing water demand. The protection and evaluation of the water resource require building and calibrating reliable numerical models to simulate and predict hydrogeological processes such as groundwater flow or transport of contaminants (Davamani et al., 2024).
Conventional calibration of groundwater flow models uses an iterative Marquard-Levenberg approach to build the Jacobian matrix for evaluating the calibration direction. This approach requires running a deterministic groundwater flow model as many times as the number of hydraulic parameters to calibrate. These models, based on a deterministic representation of the hydraulic parameters, fail to consider the heterogeneity and uncertainty of hydrogeological data (Boucher et al., 2011), and often lead to poor predictions of groundwater flow, especially in complex geological settings. To address this issue, geostatistical simulations were developed to represent the spatial distribution of hydraulic properties by assigning stochastic yet spatially sound values at all cells of a numerical hydrogeological model (Rubin and Hubbard, 2005; Blouin et al., 2013). However, geostatistical simulations are impractical for conventional calibration of hydrogeological models due to their inherent complexity and the computational resources they demand. Hydrogeological models can contains hundreds of thousands of cells and require numerous realizations to capture the spatial variability of hydraulic properties. Geostatistical simulations generate multiple equally probable realizations of these properties to account for uncertainty, but calibrating each realization against observed data (e.g., hydraulic head or contaminant concentration) is time-consuming. Conventional calibration methods, which typically involve iterative parameter adjustment, would need to be applied on every cell of each realization, drastically increasing computational costs and making the process inefficient for large-scale hydrogeological studies.
Ensemble methods were introduced by Evensen (1994) through the Ensemble Kalman Filter (EnKF) to deal with complex non-linear systems with large amount of parameters. The uncertainty is represented by a set of model realizations rather than an explicit covariance matrix, limiting the number of runs needed during the optimization process. Ensemble methods use physical models to mimic the behavior of the model over time to update an ensemble of equi-probable numerical models (Evensen, 1994). As new data become available, ensemble methods improve the previous parameters state by looking at the difference of prediction between an ensemble of models and the measured data, and then computing the residual error and the covariance between each parameters realization. Ensemble methods are used to assimilate spatio-temporal data and update numerical models in many geoscientific fields, as they are applicable to every physical problem which can be described by numerical solutions. For example, EnKF was applied by Dong et al. (2006) to assimilate four-dimensional (4D) seismic reservoir and by Johns and Mandel (2008) to model fire propagation, while ensemble methods were used by Crestani et al. (2013) for 2D tracer data test and by Bauer et al. (2015) for weather forecasting.
Many studies identified the problem of high-dimensionality as a major restriction using data assimilation by the ensemble methods. Complex numerical models need to be constrained by a large amount of observation points to converge toward a reliable ensemble of solutions. While this may be achievable in some application fields such as the oil and gas industry which can deploy considerable resources to characterize a reservoir field, this constraint often restricts the use of ensemble methods to 2D or simple 3D models (Dong et al., 2006; Emerick and Reynolds, 2012, 2013; Bauer et al., 2015).
In environmental or hydrogeological studies where large regional models are developed, tight characterization budgets often limit the amount of available data for building and calibrating reliable numerical models. As such, high-dimensional numerical models need to be calibrated with very little observations, leading to poor calibration of models and wrong predictions of hydrogeological processes. Numerous strategies have been developed to address this problem of high-dimensionality during data assimilation by ensemble methods. For example, Markovich et al. (2022) suggested using a paired simplified-complex modeling approach to reduce the bias during the calibration of complex hydrogeological models. Several studies combined direct sampling data with indirect data (geophysics) in order to increase the size of observations to assimilate. For example, Cui et al. (2020) showed that GPR data can be assimilated by an ensemble smoother with multiple data assimilation (ES-MDA) method to calibrate soil hydraulic parameters and better predict soil moisture. Kang et al. (2019) showed that joint hydrogeophysical inversion can better estimate non-Gaussian parameters in spatially heterogeneous hydraulic conductivity field by assimilating concentration and electrical resistivity tomography data. Song et al. (2019) introduced a framework integrating ensemble methods with transition probability-based geostatistics for assimilation of direct and indirect data, using reconditioning step to keep spatial continuity and overcoming overfitting. Tso et al. (2020) presented the first attempt to estimate solute source parameters by assimilating time-lapse electrical resistivity tomography measurements for leak detection. Xu et al. (2021) also used ES-MDA for contaminant location and hydraulic conductivity field on 3D synthetic model.
One of the most important limits of previous studies on ensemble methods was the use of small scale synthetic models or oversimplified representation of true field studies. Developed with a simple 3D synthetic model, this paper presents an innovative data assimilation process using successive assimilation steps for every observation point (well), and every observation type (hydraulic head, concentration of contaminant). The method developed is validated for the calibration of a complex regional numerical model of groundwater flow.
The paper first presents a summary of the conventional ES method. The proposed innovation, named the Well by Well (WbW) and observation Type by observation Type (TbT) ES data assimilation process, is then explained. The WbW & TbT method is tested on a 3D synthetic model to better present the improvements compared to a conventional data assimilation method. The results obtained by this new data assimilation approach on a real case 3D groundwater flow model are finally presented, followed by some discussions and a brief conclusion.
2 Materials and methods
Two classes of ensemble methods exist for data assimilation; Ensemble Kalman Filter (EnKF) and Ensemble Smoother (ES). The main difference between both classes is that assimilation takes place at each time step of the time-dependent process in EnKF, while all time steps are assimilated in a single operation after the last time step with ES. Sakov et al. (2010), Skjervheim et al. (2011) and Li et al. (2018) indicated a great reduction in simulation time using ES due to the avoidance to stop, store and restart simulator at each time step. Emerick and Reynolds (2013) introduced the ensemble smoother with Multiple Data Assimilation (ES-MDA), which combines the computing efficiency of ES and the performance of EnKF by iterating many times the whole process after assimilating observations at each time steps. The approach proposed in this manuscript is based on the ES-MDA method.
2.1 Ensemble smoother methods
For hydrogeological studies, the objective of a data assimilation method is to update an ensemble matrix (E) of N realizations, each realization representing a hydrogeological numerical model built with a total of m hydraulic parameters:
Values of x can represent any hydraulic parameters affecting the outcome of the numerical model; including but not limited to horizontal or vertical hydraulic conductivity, specific storage and recharge. Using the constituents of E, simulated values of time-dependent hydraulic observations (e.g. hydraulic head, concentration, water temperature) are obtained through numerical modeling at p observation points. Using the ES approach, the y observations matrix is built, combining all observation points at all time steps (p) for all N realizations:
Using Equations 1, 2, the parameters anomalies matrix A and the observations anomalies matrix Y are respectively computed by removing the simulated parameters mean () and the simulated observations mean () from the original parameter matrix (E) and observations matrix (y):
and:
Using the anomalies matrices A and Y, the Kalman gain matrix is computed to quantify the update direction at each iteration of the data assimilation process, considering the covariance between all parameters and all observations:
A measurement errors matrix R is added as a tolerance criteria when fitting the simulated observations y to the measurements made on the field. It represents the covariance matrix used to express the uncertainty on the measured hydraulic values at site. AYT is the covariance matrix linking the model parameters and observations anomalies, while YYT is the covariance matrix of observations anomalies, where the exponent T refers to as the transpose of a matrix.
The final step of the ES method is to update the ensemble matrix of parameters. Considering Ef as the ensemble matrix of initial parameters or parameters evaluated from the previous iteration, the updated ensemble matrix Ea is computed from the difference between the measured field data D and simulated observations y weighted by the Kalman gain matrix :
A large Kalman gain indicates a strong covariance between the observations and parameters, suggesting the update needs to better consider these measurements and parameters during the data assimilation. Updated parameters are stored in the updated ensemble matrix Ea, which becomes the initial ensemble matrix Ef for the next iteration.
When repeating the process over multiple iterations, it is expected that the variance between all updated realizations of the ensemble decreases, and that the error between the hydraulic measurements and the mean simulated observations of updated realizations is converging toward a global minimum.
The global ES-MDA approach, which updates all parameters using all observations at once, is efficient when the ratio of observations to parameters is high. However, this is rarely the case in hydrogeological or environmental studies, where large regional models need to be calibrated with only a few observations points. A small number of parameters or observations points, either static or dynamic, has proven to make the assimilation unstable. A small ensemble size limits the solution space and can introduce spurious or false correlations between some hydraulic parameters and the predicted observations. The presence of spurious correlations is difficult to detect and can lead to wrong updates of parameters. Anderson (2007) rightfully mentioned the need to restrain the range of the parameters update by some sort of localization. Two principal methods exist to reduce spurious correlations: covariance localization (Hunt et al., 2007; Chen and Oliver, 2013; Soares et al., 2019; Lam et al., 2020) and local analysis. Both methods should yield similar results, and the final choice should be based on numerical effectiveness and scalability (Sakov and Bertino, 2011). Local analysis is implemented in the proposed method with a user-defined distance function.
2.2 Local analysis
Local analysis is commonly used in weather prediction models and oil and gas reservoir characterization studies. It decomposes the model into sub-domains, for updating parameters only at grid cells closer to the observations points (Gaspari and Cohn, 1999; Watanabe and Datta-Gupta, 2011). Local analysis approximates the error covariance in a sub-domain of the model for data assimilation, to physically dampen spurious correlations between observations and further parameters. A critical distance, which can vary from an individual cell to the entire model, defines the local region. The local analysis equals the global assimilation scheme without local analysis if the critical distance contains the entire model.
Local analysis is implemented in the proposed method with a user-defined distance function similar to a 3D variogram ellipsoid (Chilès and Delfiner, 1999). This function describes the spatial variation of the assimilation weight with respect to the distance of every cells of the model to a specific observation well. The model is then updated only in a neighborhood around the observations, scaling the amplitude of the update with the user-defined function. The observation weight is equal to 1 at the center of the ellipsoid, and decreases following a user-defined function (linear, Gaussian or exponential) to reach zero or a small defined value at the external envelope of the ellipsoid (Goovaerts, 1997). This is an analog to a variogram with a nugget effect in geostatistics (Chilès and Delfiner, 1999). The 3D ellipsoid is based on real physical distances from the observations data and not numerical tapering, adding to the robustness of the method.
2.3 Well by Well and observation Type by observation Type method
Numerical models with large number of parameters and small number of observations are difficult to assimilate, even when using covariance localization or local analysis. Conventional ensemble assimilation methods become numerically unstable when they process data with several orders of magnitude. This is a typical challenge for environmental studies as the calibration of large and complex regional groundwater flow models must often be completed with a limited number of observation points.
Instead of assimilating all data at once, the proposed WbW & TbT approach isolates every well and every observation type during each iteration of parameter updates using the ES-MDA method. This leads to several successive assimilations using a local analysis around each observation point. The WbW & TbT method proposed in this manuscript offers a simple yet rigorous local analysis approach designed to locally reduce the information ratio (defined as the number of parameters divided by the number of observations), and to facilitate the data assimilation of large models with a limited number of observation points.
If several observation types are available to calibrate the hydrogeological model (hydraulic heads, concentrations, pressures, temperatures), every observation type is sequentially assimilated (TbT) with distinct local neighborhoods. This allows considering the different spatial structures each observation type might have. While the idea of assimilating different observations types separately (TbT component) has been used in previous researches (Cui et al., 2020; Kang et al., 2019; Shariatinik et al., 2024), the WbW component goes one step further for preserving the amplitudes by defining a local weight function around every observation well using a 3D ellipsoid variogram. This weight function is used independently on every well to assimilate their data successively. A distinct ellipsoid function can be defined for each well and observation type to consider the different spatial structures each data type may have.
The WbW & TbT approach uses the whole amplitude of information from each observation point, and for each data type analyzed. One can thus assimilate different types of observations, even if the observations are orders of magnitude different, without the fear of one observation being overshadowed by data of larger amplitude.
By sequentially defining a distinct set of weights for all observation types (ntypes) and for all wells (nwells), a weight ellipsoid matrix (W) is defined as the product of the weight assigned for each well (wj) and the local update defined at all cells (ui, j):
where the sum of weight at each well is equal to 1:
Considering this new weight matrix, the update Equation 6 then becomes:
The assimilation method proposed is based on the ES-MDA 4x originally defined by Emerick and Reynolds (2013) where 4x refers to the use of 4 iterations, implementing the new WbW & TbT approach to remove spurious correlation. The WbW & TbT method is first tested on a small 3D synthetic model, and then applied on a real 3D regional groundwater flow model.
3 3D synthetic model
A 3D synthetic model representing a coastal aquifer with a salt water intrusion is built to analyse the performance of the WbW & TbT updates with the conventional ES-MDA 4x assimilation method.
3.1 Parameters of the synthetic 3D model
The synthetic model consists of a sandy layer aquifer intersected in its center by a clay layer. The model is 100 m × 100 m horizontally, 20 m thick, and contains a total of 68,940 cells. Five pseudo wells are randomly set up to represent points of measurements of permeability (Figures 1, 2). Permeability values are assigned along the wells positions using parameters representative of each type of sediments (Table 1).
Figure 1. Position of all types of wells used in the synthetic model. Orange dots are the observation wells to monitor water levels and salt concentration; red pentagons are pumping wells; and green triforces are pseudo wells used in the reference model to infer the permeability fields.
One hundred realizations of the permeability field are simulated by sequential Gaussian simulations [SGS; Goovaerts (1997)] at all cells of the 3D model, using constraints from the five pseudo wells and variograms modeled from Gaussian transformed permeability data at the five reference wells (Table 2).
One of these realizations is extracted to act as the reference model, to compute the reference time series of hydraulic head and salt concentration. The remaining 99 realizations constitute the initial ensemble of parameters (Ef) in the synthetic model, that will be updated in order to calibrate the time series of hydraulic head and salt concentration simulated from the reference numerical model.
A hydraulic gradient is induced by setting the hydraulic head of 20.5 m on the left and 20 m on the right border. Recharge is defined at 150 mm/year on top of the model, and a no flow condition is applied at the bottom of the model. The salt water intrusion is represented by setting a concentration of 35 g/l in the 0 m to 15 m depth interval in the right part of model.
The transient simulation is run for 2, 000 days. Three pumping wells (Figures 1, 3) are used to modify the hydraulic heads and salt concentration distribution in the model between day 200 and day 1300. The peak salt concentration varies between 0.01 mg/l and 750 mg/l. These amplitude differences are challenging for a conventional ES method and will test the WbW & TbT approach developed in this manuscript.
Figure 3. Position of three pumping wells in the 3D synthetic model. Orange dots represent the well screen interval.
3.2 Assimilation tests on the synthetic model
Hydraulic head and salt concentration are monitored at five observation wells (Figure 1). Three of these wells (wells O0, O1 and O2) are used as observation wells (matrix y defined in Equation 2) for calibrating and updating the ensemble of parameters, while the remaining two wells (wells C3 and C4) are control wells used to evaluate the performance of the WbW & TbT method.
While hydraulic heads time series are also simulated at all five observation wells of the model, the results are not presented to focus the interpretation on salt concentration, which large amplitude differences better demonstrate the performance of the WbW & TbT approach. The time series of concentration at these five wells for the reference model of permeability are shown in Figure 4. Only data from t = 0 day to t = 1400 days (70% of the time series) are used to update the ensemble (blue background in Figure 4). The remaining of the data (t = 1400 to t = 2000 days) is used for studying the performance of the assimilation on data never seen (green background in Figure 4).
Figure 4. Salt concentration observed in the five wells of the reference permeability field of the 3D synthetic model (surface location in Figure 1). Blue background depicts calibration data and green background depicts validation data.
In addition to the Scenario 0 which is the base case scenario where simulations are completed on the initial ensemble of parameters, eight scenarios are run to test the impact of different localization and local analysis schemes:
• 0 : Initial ensemble without any assimilation;
• 1 : Raw ES-MDA 4x global assimilation;
• 2 : TbT assimilation, all wells together, without local analysis;
• 3 : WbW assimilation, hydraulic head only, without local analysis;
• 4 : WbW assimilation, concentration only, without local analysis;
• 5 : WbW & TbT assimilation, without local analysis;
• 6 : WbW & TbT assimilation, with local analysis using 100 m horizontal range and 20 m vertical range;
• 7 : WbW & TbT assimilation, with local analysis using 50 m horizontal range and 10 m vertical range;
• 8 : WbW & TbT assimilation, with local analysis using 25 m horizontal range and 5 m vertical range.
3.3 Assimilation results from the synthetic model
Hydraulic head and/or salt concentration time series are used to update the ensemble of permeability fields. For all scenarios tested, results are presented on the salt concentration data only to avoid repetitions in the interpretation. Figure 5 presents the root mean square error (RMSE) distribution of salt concentration computed from all updated realizations of the ensemble, for all wells, separated for calibration and validation data. As this is a synthetic model, updates on the permeability data are computed to validate the actual improvements made by each scenario on the initial realizations of the permeability field (Figure 6).
Figure 5. Root mean square error of salt concentration for eight schemes at the end of the 4th data assimilation. Blue bars denominate calibration time window, green bars are the validation time window.
Figure 6. Mean average error of log10 permeability for 8 methods versus the reference permeability field.
Scenarios 0 and 8 show the highest RMSE mean value and the greatest variance on the salt concentration data for all wells (Figure 5). This is expected for Scenario 0 which is not calibrated, and this suggests that Scenario 8 is using a update range which is too small to properly assimilate some of the observations.
While scenarios 1 and 2 show significant decrease in RMSE on salt concentration data for all wells (Figure 5), their mean average error (MAE) on permeability field increases with the iterations (Figure 6), suggesting spurious correlation and over-fitting of the assimilation on later iterations.
While using only the hydraulic head (Scenario 3) or the salt concentration data (Scenario 4) on a WbW approach improves the predictions on the salt concentration (Figure 5) or hydraulic head (not presented), results suggests that using both types of data (Scenario 5) is preferable in order to lower the MAE on the updated permeability field (Figure 6). It is interesting to note that the assimilation of hydraulic head alone is able to improve the hydraulic head and the salt concentration RMSE (Scenario 3). This is an important result because environmental studies often include hydraulic head data alone without concentration data.
Scenarios 5, 6, and 7, using the WbW & TbT assimilation method with different update ranges, show improvements in the RMSE computed (Figure 5) at all five wells. All three scenarios also show good improvements on the permeability field, with a permeability MAE change of 28% and 27% for Scenarios 5 and 6, respectively (Figure 6).
Using the assimilation Scenario 6, the measured time series of salt concentration at 5 wells (thick red line) are compared to the simulated time series of the initial and final ensembles of realizations, corresponding to models before and after the calibration, respectively (Figure 7). Each realization is presented in thin lines while the mean of all realizations is shown as thick dashed lines (blue for the initial ensemble and orange for the final ensemble).
Figure 7. Initial individual realizations of salt concentration in blue, final salt concentration in orange with the 100/20 local analysis method.
Improvements between the initial and final ensemble reach one order of magnitude for wells O1, O2 and C4 on the mean and variance of individual realizations, which proves the effectiveness of the WbW & TbT method proposed. Well O0 shows no improvement toward the measured data following the calibration. When the realizations of the initial ensemble do not include the measured salt concentration, the ensemble method can't learn the appropriate corrections to apply to the simulated time series, which makes it impossible to fit the observed concentration during the assimilation process. The assimilation shows little improvement for well C3, which was already satisfying.
Results suggest the WbW & TbT updates on conventional ES-MDA 4x method improves the assimilation of hydraulic head and salt concentration data to update the permeability fields of a 3D synthetic model. The approach allows the user to compute an update range based on physical parameters to help stabilizing the calibration process and limit spurious correlation.
4 Application to Ville Mercier site
The WbW & TbT method is tested for the calibration of the Ville Mercier regional hydrogeological model, which represents a complex and challenging site for conventional ensemble assimilation methods. Ville Mercier is located at proximity to the city of Montreal in southern Quebec, Canada. Contaminants have been released for several years in a local aquifer made of coarse sediments at site. These contaminants have traveled toward a regional fractured bedrock aquifer used for groundwater supply and irrigation in the region. The site has been investigated for several decades, and a complete set of geological, hydrogeological, and geophysical data are available to build the conceptual and hydrogeological models, and to calibrate the latter (Claprood et al., 2022, 2023).
4.1 3D hydrogeological model
The numerical model represents 6 major hydrostratigraphic units controlling the groundwater flow in the region (Figure 8). From top to bottom, these units are: recent organic sediments and filling materials, fine grained marine sediments from the Marine Clay unit, coarse glaciofluvial sediments from the Esker Sand & Gravel unit, regional Till unit, fractured bedrock and intact bedrock. The esker unit is further divided into two distinct hydrofacies (F1 and F2), with different spatial distributions of the hydraulic conductivity. The model covers an area of 226 km2 and has a mean thickness of 56.6 m. The model contains 3,974,915 elements on 35 layers (113,569 elements per layer). Elements size varies from 5 m locally to 150 m regionally.
Figure 8. 3D geomodel of hydrostratigraphic units at Ville Mercier site. Regional view has a vertical exaggeration of 40x, local view has a vertical exaggeration of 20x. Figure modified from Claprood et al. (2023).
The Esker Sand & Gravel unit and the fractured bedrock unit are local and regional aquifers, and they both have major roles in characterizing the local and regional groundwater flow in the region. The numerical model is used to represent the spatial heterogeneity of hydraulic conductivity in the two major aquifer units to predict the contaminant path. The heterogeneous sand and gravel materials from the esker unit have high and variable hydraulic conductivity, which varies from 1 × 10−−6 m s−−1 to 1 × 10−−3 m s−−1. The fractured bedrock unit shows even greater heterogeneity, with recorded hydraulic conductivity ranging between 1 × 10−10 m s−1 and 1 × 10−−4 m s−−1.
4.2 Regional deterministic calibration
Hydraulic properties to be updated at site are the hydraulic conductivity (Kxy and Kz) and, to a lesser degree, the specific storage (Ss) of the different units. Many observations of Kxy have been recorded locally at Ville Mercier site, but few data are available regionally. Minimum, maximum and mean values computed for each unit are reported in Table 3.
Table 3. Reported values of horizontal hydraulic conductivity (Kxy) within the limits of the numerical model at Ville Mercier.
These values are used to establish the initial parameters implemented into the 3D regional numerical model for calibrating the values of Kxy. The regional model has initially been calibrated in a steady-state regime through a conventional Marquardt-Levenberg minimization approach in PEST (White, 2018) to update the hydraulic parameters. For most units (organic material, Marine Clay, Till unit, intact bedrock), a single value of Kxy is assigned and updated to represent its regional contribution. Spatial distribution of Kxy is assigned in the Esker Sand & Gravel and fractured bedrock units by kriging, using reported Kxy values as fixed hard data and spatially distributed pilot-points as hard data points to update within the two units (Table 4).
Table 4. Initial and calibrated values of horizontal hydraulic conductivity (Kxy) and hydraulic conductivity ratio (Kr = Kxy/Kz) in the deterministic numerical model at Ville Mercier.
The results from this initial calibration (Claprood et al., 2023) show the limits of the deterministic approach, which doesn't offer enough flexibility on the hydraulic parameters to achieve great fit between the simulated and observed hydraulic heads, particularly locally in the Esker Sand & Gravel and fractured bedrock units. The great heterogeneity of hydraulic properties limits its calibration because the conventional approach can't represent the complex spatial behavior of Kxy with single values or interpolated kriged fields with sufficient precision. A stochastic assimilation approach is needed to represent the spatial heterogeneity of hydraulic conductivity in the Esker Sand & Gravel and fractured bedrock units.
4.3 Hydraulic conductivity heterogeneity
To represent the local spatial heterogeneity of hydraulic conductivity in the Esker Sand & Gravel F1 and F2 hydrofacies and the fractured bedrock unit at Ville Mercier site, 200 realizations of Kxy are completed by geostatistical turning bands simulations. That number of realizations, greater than the 100 models suggested in the literature (Raanes, 2015; Evensen, 2009), provides enough uncertainty and variability between the realizations while anticipating the non-convergence of some models during the multiple simulations-assimilations iterations.
The Esker Sand & Gravel F1 hydrofacies is sampled by 86 readings of Kxy measured from 18 wells. Following a Gaussian transformation of the logarithm of Kxy, the modeled variogram computed in this F1 hydrofacies is:
• normalized variance of 1 m2 s-2, including a 0.01 m2 s-2 nugget effect;
• generalized Cauchy model with 500 m horizontal range, 3 m vertical range, and a 0.99 m2 s-2 sill.
The Esker Sand & Gravel F2 hydrofacies being sampled by only 37 readings of Kxy in 9 wells, its experimental variogram is too variable to model. The variogram modeled for the F1 hydrofacies is used to represent the Kxy spatial distribution in the F2 hydrofacies.
The fractured bedrock unit is sampled by 111 readings of Kxy from 15 wells. Following a Gaussian transformation of the logarithm of Kxy, the modeled variogram computed in the fractured bedrock is:
• normalized variance of 1 m2 s-2, including a 0.01 m2 s-2 nugget effect;
• generalized Cauchy model with 500 m horizontal range, 10 m vertical range, and a 0.89 m2 s-2 sill;
• generalized Cauchy model with 5000 m horizontal range, 50 m vertical range, and a 0.10 m2 s-2 sill.
The values of Kxy are simulated independently in F1 and F2 hydrofacies. While both hydrofacies are represented by the same spatial distribution function (variogram), the hydrofacies represent different deposition environments and no spatial relation should exist at the interface between F1 and F2. 200 realizations of Kxy are simulated at 417655 cells within the F1 hydrofacies, at 322070 cells within the F2 hydrofacies, and at 355046 cells within the fractured bedrock unit.
Two realizations of Kxy are presented as cross-sections in Figure 9 and show the strong lateral and vertical heterogeneity of hydraulic conductivity simulated in both units.
Figure 9. Vertical cross-sections of two realizations of horizontal hydraulic conductivity. Figure modified from Claprood et al. (2023).
4.4 Stochastic assimilation approach
Groundwater flow simulations are completed on steady-state and transient regimes of the Ville Mercier hydrogeological model, using the regionally calibrated hydraulic parameters from Table 4 and the values of Kxy extracted from the 200 geostatistical realizations in the Esker Sand & Gravel F1 and F2 hydrofacies and the fractured bedrock unit. The transient regime is comprised of 2 separate events (Figure 10):
• 1 pumping test conducted locally at site (pumping well PW_A) with hydraulic head time series recorded at 13 wells in the esker and fractured bedrock;
• 1 recovery test conducted locally at site (pumping wells PW_B and PW_C) with hydraulic head time series recorded at 12 wells in the esker and fractured bedrock.
Figure 10. Pumping rates for the Pumping test and the Recovery test completed at the Ville Mercier site. Pumping test includes data from 2005 shifted to 2021 to simplify the numerical modeling.
Due to the great heterogeneity of Kxy and insufficient constraints at proximity to the pumping wells, only 106 out of the initial 200 realizations of Kxy converge when running the transient groundwater flow model with the assigned pumping rates. These 106 models form the initial ensemble of realizations for data assimilation. The hydraulic parameters are updated through the use of iterative ensemble smoother (iES) to assimilate steady-state and transient hydraulic heads observations for improving the understanding of local hydrogeological conditions.
This site is considered as a difficult site to assimilate because the number of parameters to update (total of 1, 094, 774 parameters) is significantly higher than the number of observations assimilated (5, 645 hourly hydraulic heads observations from 25 wells). The amplitudes of the variations in hydraulic heads in response to the pumping or recovery tests are small, within the order of magnitude of the convergence threshold at several observation wells.
The stochastic assimilation of the Ville Mercier site is completed by two approaches, independently. The first approach uses PESTPP-IES, a software representing a standard in the industry (White, 2018; White et al., 2020). The second approach is the proposed WbW & TbT ES-MDA 4x method developed in this manuscript. The assimilation of hydraulic head time series by PESTPP-IES is obtained by using standard parameters, no localization, and the results were considered optimal after one iteration. The proposed WbW & TbT ES-MDA 4x approach is simplified as a WbW ES-MDA 4x approach since only hydraulic head observations type is assimilated at the site. The proposed WbW approach is used with an optimized local analysis of 500 m horizontally and 20 m vertically, and with 4 iterations with a multiplication factor of 4 between iterations to artificially increase the update amplitude and limit numerical instabilities.
4.5 Assimilation results at Ville Mercier site
The results obtained with the proposed WbW ES-MDA 4x approach are compared with the results obtained using the software PESTPP-IES in Table 5 to validate the method developed in this manuscript. Table 5 presents the variations in the root mean square error (RMSE) computed for each observation well time series. A negative value indicates an improvement in the fit between the simulated and observed hydraulic head before and after the assimilation, while a positive value indicates the fit has worsened during the assimilation process.
Table 5. Root mean square error (RMSE) variations from WbW ES-MDA 4x and PESTPP-IES assimilation approaches at Ville Mercier site.
The results from Table 5 show that the WbW approach provides a better assimilation of the hydraulic well time series for a majority of wells. The median of RMSE variations for the WbW approach (-11.5%) is lower than that obtained by PESTPP-IES (-6.3%). The mean of RMSE variations with the WbW approach shows a strong decrease (-17.7%), while it shows an increase of +10.2% with PESTPP-IES. These results suggest that the PESTPP-IES approach, applied with standard parameters, fails dramatically to assimilate some specific observations (Obs Wells 2-6-10-11). This is mainly caused by some wells being overlooked with PESTPP-IES while the global assimilation process focusses on nearby wells showing opposite variations. The sequential aspect of the proposed WbW approach forces the full assimilation of every individual observation, making sure all wells are explicitly taken into account during the parameters updates.
Figure 11 shows the results for four selected observations wells, showing how the proposed WbW method compares with the results obtained by PESTPP-IES. The figure shows the mean simulated time series of hydraulic heads variations for the initial and final ensemble of realizations.
Figure 11. Mean of simulated hydraulic heads variations with respect to initial value at beginning of pumping tests at four selected observations wells.
Obs Well 3 presents a time series with very low amplitude hydraulic head variations (total of fifteen centimeters), within the numerical model's convergence tolerance, showing poor response to the induced stress caused by the recovery test. The WbW approach shows no improvement nor deterioration of the fit between the simulated and observed data, which is expected considering the low amplitude variations. However, the assimilation with PESTPP-IES shows a deterioration of the fit with increasing variations of hydraulic heads being predicted.
The combined analysis of Obs Well 5 and Obs Well 6 shows the advantages of using the sequential WbW approach and the limits of the global assimilation approach used by PESTPP-IES. Both observation points are in the same well, Obs Well 5 being screened in the Esker Sand & Gravel unit and Obs Well 6 being screened in the deeper fractured bedrock unit. While the assimilation is better with PESTPP-IES in the esker unit (Obs Well 5), it can not assimilate appropriately the variations in the bedrock unit (Obs Well 6). The is a direct consequence of the global assimilation approach used in PESTPP-IES, the algorithm being unable to assimilate both intervals separately. The WbW approach assimilating both intervals sequentially, the algorithm recognizes the different amplitude variations in both aquifers and is able to converge toward the correct solutions for both wells while still having some issues to reach a perfect calibration.
The amplitude of the RMSE reduction at Obs Well 12 is greater with PESTPP-IES than with the proposed WbW ES-MDA 4x method. However, only the proposed approach succeeds at assimilating the temporal aspect on the hydraulic head time series for this well located nearby the pumping wells of the recovery test.
5 Discussion
The construction and calibration of numerical models for groundwater flow or contaminant transport are essential to ensure that model predictions align with real-world observations. The sequential approach proposed, which integrate observations data type by data type and well by well, is an innovative advancement in ensemble methods. This approach enhances the predictive capability of models, especially in environmental studies where data is limited. Its effectiveness is demonstrated in a real-world application involving a complex geological model and decades of hydraulic head data. This sequential strategy is particularly important in environmental studies where wells are sparsely distributed, making their influence critical, though sometimes conflicting with wells that have higher data amplitudes.
Accurately calibrated groundwater flow models can then be used to predict the spread of contaminants or changes in groundwater flow patterns under various boundary conditions, including those driven by climate change. By employing ensemble methods, researchers can generate a range of realistic future scenarios, providing a probabilistic assessment of different outcomes. The proposed sequential approach reduces prediction uncertainty by incorporating all observation data more effectively during the calibration process. This leads to greater confidence in the models ability to simulate complex environmental processes and offer reliable predictions, whether for climate change scenarios, environmental protection programs, or contaminant transport pathways.
6 Conclusion
In hydrogeological studies, complex groundwater flow models that include a large number of parameters have to be calibrated with relatively small number of observations. Iterative ensemble smoother methods proved efficient to reduce the computational burden when calibrating complex groundwater flow models. However, the conventional approach of assimilating all observations together with a global update can lead to two undesired effects: (1) variance collapse where updated hydraulic conductivity fields from all realizations of the ensemble are updated toward a unique model and (2) spurious correlations which randomize the assimilation and limit the optimization.
The conservation of sufficient variance from the ensemble realizations is needed to converge toward the global minimum in the next iteration. A variance collapse is seen from the results obtained after the first iteration by PESTPP-IES on the 3D Ville Mercier model. The second iteration (results not presented) provokes a complete collapse of the variance. The proposed WbW & TbT approach generally allows a better reduction of median and mean RMSE, while keeping sufficient variance within the ensemble to evaluate the uncertainty. Regardless of the local analysis scheme selected for the assimilation, no variance collapse is observed with the WbW & TbT approach, suggesting the sequential approach allows keeping the level of variability high enough to pursue multiple iterations of assimilation.
Local analysis improves the assimilation on the ensemble of realizations on the 3D synthetic model. Before running the final assimilation, tests should be completed to select the appropriate range for the local update of hydraulic parameters around each observation point. If the range is too small, the update is difficult because it involves too little parameters; if the range is too large, spurious correlations can take control of the assimilation and limit the optimization. The updating range should allow each cell of the model to be updated by at least one observation point. To reduce uncertainty and further limit spurious correlations, the model elements should be updated sequentially by several observation points.
The WbW & TbT sequential approach proposed has proven to be optimal to properly assimilate observations with different orders of magnitudes. The sequential approach ensures every observation point is equally considered during the assimilation process, regardless of its amplitude and data type. This is essential in groundwater studies as observation types and different wells may have data that span over several orders of magnitudes. Since the sequential approach proposed assimilates one observation well at the time, it limits the risk of divergence during calibration which occur when multiple observation data suggest opposite updates in the hydraulic parameters.
Further research would allow testing an adaptive neighborhood around observation wells, which would vary at each iteration. This would allow using a global approach on the first iteration for a general update of all elements, and progressively reducing the neighborhood distance to assimilate only local parameters around the observation wells during the final iterations of the assimilation.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Author contributions
TB: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft. MC: Conceptualization, Data curation, Investigation, Project administration, Supervision, Validation, Visualization, Writing – review & editing. EG: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Validation, Writing – review & editing.
Funding
The author(s) declare financial support was received for the research, authorship, and/or publication of this article. Funding for the project was provided by a private contract from the Ministère de l'Environnement, de la Lutte contre les Changements Climatiques, de la Faune et de Parcs.
Acknowledgments
Thanks to DHI Group for providing a FEFLOW academic license. Thanks for MELCCFP for financing part of this project.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Anderson, J. L. (2007). Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter. Physica D: Nonlinear Phenomena 230, 99–111. doi: 10.1016/j.physd.2006.02.011
Bauer, P., Thorpe, A., and Brunet, G. (2015). The quiet revolution of numerical weather prediction. Nature 525, 47–55. doi: 10.1038/nature14956
Blouin, M., Martel, R., and Gloaguen, E. (2013). Accounting for aquifer heterogeneity from geological data to management tools. Groundwater 51, 421–431. doi: 10.1111/j.1745-6584.2012.00982.x
Boucher, M.-A., Anctil, F., Perreault, L., and Tremblay, D. (2011). A comparison between ensemble and deterministic hydrological forecasts in an operational context. Adv. Geosci. 29, 85–94. doi: 10.5194/adgeo-29-85-2011
Chen, Y., and Oliver, D. S. (2013). Levenberg-Marquardt forms of the iterative ensemble smoother for efficient history matching and uncertainty quantification. Comp. Geosci. 17, 689–703. doi: 10.1007/s10596-013-9351-5
Chilès, J. P., and Delfiner, P. (1999). Geostatistics: Modeling Spatial Uncertainty, 2nd Edition Hoboken NJ: Wiley.
Claprood, M., Béraud, T., Boutin, L.-C., Gloaguen, E., and Martel, R. (2023). “Rapport final du projet : UTES-II Ville Mercier,” in Rapport soumis au Ministère de l'Environnement, de la lutte contre les changements climatiques, de la Faune et des Parcs, Institut National de la recherche scientifique. Available at: https://www.environnement.gouv.qc.ca/eau/souterraines/lagunes-mercier/rapport-final-UTES-II-ville-Mercier.pdf (accessed November 30, 2023).
Claprood, M., Gloaguen, E., Béraud, T., Blouin, M., Dupuis, C., Ferron, P., et al. (2022). A case study using seismic reflection and well logs to reduce and quantify uncertainty during a hydrogeological assessment. Front. Water 3:779149. doi: 10.3389/frwa.2021.779149
Crestani, E., Camporese, M., Bau, D., and Salandin, P. (2013). Ensemble Kalman filter versus ensemble smoother for assessing hydraulic conductivity via tracer test data assimilation. Hydrol. Earth System Sci. 17, 1517–1531. doi: 10.5194/hess-17-1517-2013
Cui, F., Bao, J., Cao, Z., Li, L., and Zheng, Q. (2020). Soil hydraulic parameters estimation using ground penetrating radar data via ensemble smoother with multiple data assimilation. J. Hydrol. 583:124552. doi: 10.1016/j.jhydrol.2020.124552
Davamani, V., John, J. E., Poornachandhra, C., Gopalakrishnan, B., Arulmani, S., Parameswari, E., et al. (2024). A critical review of climate change impacts on groundwater resources: a focus on the current status, future possibilities, and role of simulation models. Atmosphere 15:122. doi: 10.3390/atmos15010122
Dong, Y., Gu, Y., and Oliver, D. S. (2006). Sequential assimilation of 4D seismic data for reservoir description using the ensemble Kalman filter. J. Petrol. Sci. Eng. 53, 83–99. doi: 10.1016/j.petrol.2006.03.028
Emerick, A., and Reynolds, A. (2012). History matching time-lapse seismic data using the ensemble Kalman filter with multiple data assimilations. Comp. Geosci. 16:5. doi: 10.1007/s10596-012-9275-5
Emerick, A. A., and Reynolds, A. C. (2013). Ensemble smoother with multiple data assimilation. Comp. Geosci. 55, 3–15. doi: 10.1016/j.cageo.2012.03.011
Evensen, G. (1994). Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res.: Oceanics 99, 10143–10162. doi: 10.1029/94JC00572
Evensen, G. (2009). The ensemble Kalman filter for combined state and parameter estimation. IEEE Cont. Syst. Magaz. 29, 83–104. doi: 10.1109/MCS.2009.932223
Gaspari, G., and Cohn, S. E. (1999). Construction of correlation functions in two and three dimensions. Q. J. R. Meteorol. Soc. 125, 723–757. doi: 10.1256/smsqj.55416
Gleeson, T., Cuthbert, M., Ferguson, G., and Perrone, D. (2020). Global groundwater sustainability, resources, and systems in the anthropocene. Ann. Rev. Earth Planetary Sci. 48, 431–463. doi: 10.1146/annurev-earth-071719-055251
Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation. Oxford: Oxford University Press.
Hunt, B. R., Kostelich, E. J., and Szunyogh, I. (2007). Efficient data assimilation for spatiotemporal chaos: a local ensemble transform Kalman filter. Physica D: Nonlin. Phenom. 230, 112–126. doi: 10.1016/j.physd.2006.11.008
Johns, C. J., and Mandel, J. (2008). A two-stage ensemble Kalman filter for smooth data assimilation. Environ. Ecol. Stat. 15, 101–110. doi: 10.1007/s10651-007-0033-0
Kang, X., Shi, X., Revil, A., Cao, Z., Li, L., Lan, T., et al. (2019). Coupled hydrogeophysical inversion to identify non-gaussian hydraulic conductivity field by jointly assimilating geochemical and time-lapse geophysical data. J. Hydrol. 578:124092. doi: 10.1016/j.jhydrol.2019.124092
Kummu, M., Guillaume, J. H. A., de Moel, H., Eisner, S., Flörke, M., Porkka, M., et al. (2016). The world's road to water scarcity: shortage and stress in the 20th century and pathways towards sustainability. Sci. Rep. 6:38495. doi: 10.1038/srep38495
Lam, D.-T., Renard, P., Straubhaar, J., and Kerrou, J. (2020). Multiresolution approach to condition categorical multiple-point realizations to dynamic data with iterative ensemble smoothing. Water Resour. Res. 56:e2019WR025875. doi: 10.1029/2019WR025875
Li, L., Stetler, L., Cao, Z., and Davis, A. (2018). An iterative normal-score ensemble smoother for dealing with non-Gaussianity in data assimilation. J. Hydrol. 567, 759–766. doi: 10.1016/j.jhydrol.2018.01.038
Markovich, K. H., White, J. T., and Knowling, M. J. (2022). Sequential and batch data assimilation approaches to cope with groundwater model error: an empirical evaluation. Environm. Model. Softw. 156:105498. doi: 10.1016/j.envsoft.2022.105498
Raanes, P. (2015). On the ensemble Rauch-Tung-Striebel smoother and its equivalence to the ensemble Kalman smoother. Quart. J. Royal Meteorol. Soc. 142:2728. doi: 10.1002/qj.2728
Sakov, P., and Bertino, L. (2011). Relation between two common localisation methods for the EnKF. Comp. Geosci. 15, 225–237. doi: 10.1007/s10596-010-9202-6
Sakov, P., Evensen, G., and Bertino, L. (2010). Asynchronous data assimilation with the EnKF. Tellus A: Dynamic Meteorol. Oceanog. 62, 24–29. doi: 10.1111/j.1600-0870.2009.00417.x
Shariatinik, B., Gloaguen, E., Raymond, J., Boutin, L.-C., and Fabien-Ouellet, G. (2024). Ert data assimilation to characterize aquifer hydraulic conductivity heterogeneity through a heat-tracing experiment. Near Surface Geophysics 22, 358–371. doi: 10.1002/nsg.12288
Skjervheim, J. A., Evensen, G., Hove, J., and Vabø, J. G. (2011). An Ensemble Smoother for assisted History Matching. Richardson, TX: OnePetro.
Soares, R. V., Maschio, C., and Schiozer, D. J. (2019). A novel localization scheme for scalar uncertainties in ensemble-based data assimilation methods. J. Petrol. Explorat. Prod. Technol. 9, 2497–2510. doi: 10.1007/s13202-019-0727-5
Song, X., Chen, X., Ye, M., Dai, Z., Hammond, G., and Zachara, J. M. (2019). Delineating facies spatial distribution by integrating ensemble data assimilationand indicator geostatistics with level-set transformation. Water Resour. Res. 55, 2652–2671. doi: 10.1029/2018WR023262
Tso, C.-H. M., Johnson, T. C., Song, X., Chen, X., Kuras, O., Wilkinson, P., et al. (2020). Integrated hydrogeophysical modelling and data assimilation for geoelectrical leak detection. J. Contam. Hydrol. 234:103679. doi: 10.1016/j.jconhyd.2020.103679
Watanabe, S., and Datta-Gupta, A. (2011). “Use of phase streamlines for covariance localization in ensemble Kalman filter for three-phase history matching,” in SPE Western North America Region Meeting.
White, J. T. (2018). A model-independent iterative ensemble smoother for efficient history-matching and uncertainty quantification in very high dimensions. Environm. Model. Softw. 109, 191–201. doi: 10.1016/j.envsoft.2018.06.009
White, J. T., Hunt, R. J., Fienen, M. N., and Doherty, J. E. (2020). Approaches to Highly Parameterized Inversion: PEST++ Version 5, a Software Suite for Parameter Estimation, Uncertainty Analysis, Management Optimization and Sensitivity Analysis. US Geological Survey.
Keywords: ensemble smoother, data assimilation, hydrogeological modeling, local analysis, Kalman, location updating
Citation: Béraud T, Claprood M and Gloaguen E (2024) A sequential ensemble smoother for multiple data assimilation in hydrogeological modeling. Front. Water 6:1462914. doi: 10.3389/frwa.2024.1462914
Received: 10 July 2024; Accepted: 15 October 2024;
Published: 01 November 2024.
Edited by:
Shasha Han, University of Birmingham, United KingdomReviewed by:
Michael Tso, UK Centre for Ecology and Hydrology (UKCEH), United KingdomShiblu Sarker, Virginia Department of Conservation and Recreation, United States
Copyright © 2024 Béraud, Claprood and Gloaguen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Maxime Claprood, mclaproo@uqac.ca
†Present addresses: Thomas Béraud, Geostack, Quebec City, QC, Canada;
Maxime Claprood, Université du Québec à Chicoutimi, Saguenay, QC, Canada