Reconstructing the Historical Terrestrial Water Storage Variations in the Huang–Huai–Hai River Basin With Consideration of Water Withdrawals

Yang, Chuanxun; Liu, Yangxiaoyue; Yang, Ji; Li, Yong; Chen, Shuisen

doi:10.3389/fenvs.2022.840540

ORIGINAL RESEARCH article

Front. Environ. Sci. , 25 August 2022

Sec. Environmental Informatics and Remote Sensing

Volume 10 - 2022 | https://doi.org/10.3389/fenvs.2022.840540

This article is part of the Research Topic Research of Remote Sensing and Big data Convergence on Environmental Assessments View all 5 articles

Reconstructing the Historical Terrestrial Water Storage Variations in the Huang–Huai–Hai River Basin With Consideration of Water Withdrawals

Chuanxun Yang^1,2^†

Yangxiaoyue Liu^3,4^†

Ji Yang^5,6

Yong Li^5,6

Shuisen Chen^5,6*

¹Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, China
²University of Chinese Academy of Sciences, Beijing, China
³State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China
⁴Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, China
⁵Guangdong Provincial Key Laboratory of Remote Sensing and Geographical Information System, Guangdong Open Laboratory of Geospatial Information Technology and Application, Guangzhou Institute of Geography, Guangdong Academy of Sciences, Guangzhou, China
⁶Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China

The Huang-Huai-Hai River Basin in eastern China has suffered from severe water scarcity during recent decades due to the effects of climate change and human activities. Quantifying the changes in the amount of terrestrial freshwater available in this region and their driving factors is important for understanding hydrological processes and developing a sustainable water policy. This study proposed an ensemble learning model to reconstruct historical variations in the terrestrial water storage (TWS) of the Huang-Huai-Hai River Basin, China. The model was trained using the observations of the variations in TWS from the Gravity Recovery and Climate Experiment mission (GRACE) satellites, climatic driving, and human withdrawal datasets produced on a monthly scale. The variations in the reconstructed TWS were compared with the results of several land surface and hydrological models with a variety of in situ measurements of the soil water content. The contributions of the climate and human activity to the ensemble learning model were also quantified. The results show that the proposed approach generally outperforms the land surface and hydrological models examined in this study, matches the patterns in the GRACE solutions, and reconstructs past changes in TWS, which are consistent with the GRACE observations. Climatic variables are the most important in the ensemble learning model, with precipitation over the prior month being a critical factor. The model that includes human intervention tends to perform better than without it. Irrigation, industry, and domestic water withdrawals contribute equally to the model. This study provides a flexible and easily implementable model that can bridge the gap between GRACE observations and past changes in TWS. The model is applicable in areas with intense human activities, and the results have the potential to be assimilated into and enhance hydrological models.

1 Introduction

Monitoring and modeling the dynamics of terrestrial freshwater resources are vital for human beings to cope with the growing global water crisis (Carpenter et al., 2011). Most terrestrial freshwater resources, excluding those in Antarctica and Greenland, are stored as groundwater, which is difficult to sense directly by using conventional satellite-based optical remote sensing (Alsdorf et al., 2007; Wood et al., 2011; Taylor et al., 2012; Gleeson et al., 2015). Although the groundwater table in the wells can be measured, it is difficult to estimate water storage dynamics at the scale of a large basin or continent based on the spatially sparse water table records (Yeh et al., 2006; Voss et al., 2013). Land surface models (LSMs) and global hydrological studies and water resource models (GHWMs) provide alternative ways to understand land water storage and its dynamics on a global basis (Wood et al., 2011; Mo et al., 2016). However, problems can arise, especially over areas in which human activity is intense, as a result of many uncertainties in the experimental parameters, input data and index, and initial conditions of the model (Scanlon et al., 2018). Therefore, the development of alternative methods that can be implemented to monitor and model terrestrial water storage (TWS) is of great significance.

The Gravity Recovery and Climate Experiment (GRACE) satellites, which were launched in 2002, revolutionized the observation and understanding of terrestrial hydro-systems (Rodell, 2004; Yeh et al., 2006; Feng et al., 2013; Save et al., 2016; Scanlon et al., 2016; Wiese et al., 2016; Andrew R. et al., 2017; Scanlon et al., 2018). Compared with global models, GRACE solutions provide a more direct means of measuring variations in global water storage because the satellites measure changes in the Earth’s gravity field by quantifying alterations in the relative distance between the twin GRACE satellites that occur as a result of gravitational changes. After removing noise from the results, such movements can be considered quite relevant to changes in the amount of water stored at a particular place (Save et al., 2016; Sean Swenson, 2006; Watkins et al., 2015). Variations in the TWS that are measured by the GRACE satellites result from the influences of both climatic and human interventions. Therefore, the GRACE solutions provide reliable information about the variation in TWS on a global basis, and the system has been successfully applied in drought mapping (Leblanc et al., 2009), surface water monitoring (Rodell et al., 2009; Huang et al., 2012; Feng et al., 2013), and water balance modeling (Rodell, 2004; Rodell et al., 2011).

Given the higher reliability of this method as compared to LSMs and GHWMs, the results of GRACE total water storage anomalies (TWSA) have the potential for improving models describing TWS dynamics. The assimilation or fusion of the GRACE solutions with land surface models or global reanalysis systems has garnered increasing amounts of attention in recent years (Kumar et al., 2016; Li et al., 2019). The empirical GRACE observations can be integrated using global model outputs by linking TWS-related variables with the TWSA dataset produced by GRACE. Such models can be simplified to linear or polynomial functions that occur between the TWSA and related variables under ideal conditions, and successful experiments have been conducted for the Amazon Basin using this method (Nie et al., 2015; Humphrey et al., 2017). These models are certainly based on a fundamental hypothesis that the mismatches in TWSA and land surface model simulations can be calibrated using typical regression models; however, this assumption is only valid for specific regions and in certain situations. Machine learning is a powerful tool for fitting relations. As reported in previous study (Long et al., 2014), artificial neural networks (ANNs) can produce GRACE TWSA-like predictions for southwestern China that date back to the 1980s by learning the matching patterns between the available GRACE TWSA and independent indicators such as precipitation and soil moisture. Similar experiments were conducted over northwestern China by comparing three machine learning approaches (support vector machine, artificial neural networks, and random forest (RF)) and the general linear regression model (Yang et al., 2018). The empirical results indicated that the RF model outperformed the others and that the linear regression model was not applicable in this area. A recent study has also proven that ensemble learning algorithms perform well when learning the matching patterns between GRACE TWSA and climate forcings (Jing et al., 2020).

However, few experiments have been conducted in areas undergoing intensive irrigation or industrial activity, which are the major human interventions that lead to a decrease in TWS (Feng et al., 2013; Voss et al., 2013). Human withdrawal can be one of the most significant factors that lead to errors in land surface models for heavily irrigated areas (Joodaki et al., 2014; Pokhrel et al., 2017; Tangdamrongsub et al., 2018). In addition, the tools and models that have been initiated for extending the time span covered by GRACE rely heavily on the soil moisture produced by global models or reanalysis systems. This limits the application of the proposed models because they are restricted by the availability of LSMs or the output from reanalysis systems for specific regions and periods. More importantly, it is impossible to separately quantify the contributions of climate and artificial factors with these models due to the dependence on soil moisture variables and the ignorance surrounding human intervention.

This study proposes a machine learning model that can learn the underlying patterns connecting TWS dynamics with variations in the climate and human water withdrawal. The model has been tested in the Huang-Huai-Hai River Basin of China, which is a heavily irrigated area with intensive human activity and is suffering from a water scarcity crisis. A GRACE-consistent TWSA was generated for the study area back to the 1980s and was compared with LSMs/GHWMs and in-situ soil moisture measurements. The significance of the contribution from each factor to the TWSA estimation model is estimated. This study attempts to provide new perspectives for the high-quality modeling of TWS dynamics by using a machine learning framework over areas that suffer from severe human-induced water scarcity. It is highly expected that flexible machine learning tools have great potentials to enrich complex physical models and extend our understanding of TWS modeling for areas such as that studied herein.

2 Research Area

The research area covers the Huang-Huai-Hai River Basin in China, which comprises three major river basins: the Yellow River (or Huanghe), Huaihe River, and Haihe River basins. It is located between 95°53′–122°42′E and 30°57′–42°43′N. The areas of the three basins are 752,000 km² (Yellow River Basin), 274,000 km² (Huaihe River Basin), and 318,000 km² (Haihe River Basin). Dominated by low plains and suitable temperatures, the Huang-Huai-Hai River Basin seems abundant as a large block of well cultivated land and is considered one of the most nourished agricultural zones in China. The Huang-Huai-Hai River basin has long been the political and cultural center of China. The cultivated land, population, and gross domestic product (GDP) of this basin account for more than one-third of the national totals, while the amount of water available accounts for approximately 7% of the total (Yan et al., 2013; Yuan et al., 2019). The terrain and climate in the basin changes gradually from west to east; it includes a plateau with a mountain climate, hills with a continental climate, and temperate plains that are affected by monsoon.

3 Data Sources

3.1 Model Input Data

3.1.1 CSR GRACE Mascon Solution

The data of terrestrial water storage anomalies (TWSAs) derived from the latest GRACE RL06 Mascon solution produced at the Center for Space Research (CSR), the University of Texas at Austin are used in this study (Save et al., 2016; Save, 2019). The CSR GRACE TWSA dataset is available within the period from January 2003 to June 2017, and the dataset implemented in this study covers the duration from January 2003 to December 2015. The TWSAs produced are relative to the mean baseline for 2004–2009 and are provided on a 1/4 (0.25) degree grid. The gridded TWSA data are rescaled to 1/2 (0.5) degree grids by averaging all the 1/4-degree grids within a 1/2°, which is consistent with the climate data and global model outputs in this study.

3.1.2 The Climate Forcing Data

The climate data version 4.03 of the Climatic Research Unit Time Series (CRU TS) derived from the University of East Anglia was used as climate forcing data (Harris et al., 2014). The CRU TS dataset was gridded (1/2°) using records from over 4,000 weather sites. Ten variables included in the dataset that have been considered important parameters in the water cycle were used in this study (Table 1).

TABLE 1

TABLE 1. Variables in the CRU TS climate data.

3.1.3 Water Withdrawals

Water withdrawal data were obtained from the global hydrology model PCR-GLOBWB 2.0, which was developed at Utrecht University (Sutanudjaja et al., 2018). The model uses two sets of computational grids (5 arcmin (1/12 arc-degree) and 30 arcmin (1/2 arc-degree)) that cover 5 continents, excluding Greenland and Antarctica.

Irrigation and non-irrigation water withdrawals were calculated separately in the PCR-GLOBWB 2.0 model. Irrigation water includes paddy water and that used for other agriculture purposes, whereas non-irrigation water covers three sectors: industry, livestock, and domestic water withdrawal. The irrigation water demand is calculated first, and water withdrawal is set to equal the gross water demand unless sufficient water is not available.

The irrigation water demand is calculated from the crop composition (which varies each month and includes multi-cropping) and the area irrigated per cell. The basic dataset is derived from MIRCA 2000, which includes monthly irrigation (paddy and nonpaddy irrigation fractions per cell) and rain-fed crop areas. The total area of irrigation per cell varies over time and is generally based on the area reported by FAOSTAT. The calculation of the amount of water demanded for irrigation follows the FAO guidelines (Rao and Chandran, 1977; Allen et al., 1998; Sutanudjaja et al., 2018). Demand that is not associated with irrigation was calculated using the approaches from (Wada et al., 2014). In cases where available water is insufficient, the water withdrawal amount is measured and scaled down to that of water available and then distributed proportionally to retrieve the gross water demand of every sector.

A dataset comprising 30 arc-minute grid cells from the period 1979–2015 was used in this study. Table 1 summarizes the water withdrawal variables used. Detailed information concerning the PCR-GLOBWB model can be found in (Sutanudjaja et al., 2018).

3.1.4 Auxiliary Data

The auxiliary input data included the digital elevation model (DEM) and data describing the climate zone, which were used to address the complex variations in terrain and climate in the study area. DEM data from the GTOPO30 dataset (https://www.usgs.gov/centers/eros/) were used. The 1/2-degree gridded Köppen-Geiger climate zone data (Kottek et al., 2006) from http://koeppen-geiger.vu-wien.ac.at/were used to describe the climate.

3.2 Data Used for Comparison

3.2.1 Global Models

Two global land surface water models and one hydrological research and water resources model are together compared with the TWS reconstructed using the machine learning model. The catchment land surface model (catchment) (Koster et al., 2000) and the Noah model (Chen et al., 1997; Chen et al., 1996; Ek et al., 2003; Koren et al., 1999), which are included in version 2 of the Global Land Data Assimilation (GLDAS-2), are used for comparison, and the datasets are obtained from the website https://disc.gsfc.nasa.gov/datasets. The PCR-GLOBWB model (Sutanudjaja et al., 2018) available at http://www.globalhydrology.nl/is used to describe the hydrology in the study area. It should also be mentioned that the catchment model does not comprise the surface water storage amount and that the groundwater and surface water storage are not simulated by the Noah model, whereas PCR-GLOBWB includes all the TWS components. The PCR-GLOBWB model considers human water consumption, whereas human intervention is not included in the land surface models.

The TWS of the three models is calculated by summing all the available components. The PCR-GLOBWB model is run in a no-human mode (i.e., without human intervention) to assess the impact of human activities factors on the model. The average TWS from 2004 to 2009 is removed from the original TWS to obtain the TWSA before a comparison with the GRACE TWSA. As a result, we refer the TWSA obtained from the catchment, Noah, and PCR-GLOBWB models as catchment TWSA, Noah TWSA, and GLOBWB TWSA, respectively. Additionally, the TWSA of the PCR-GLOBWB model with no artificial intervention is referred to as the GLOBWB(N) TWSA.

3.2.2 The In-Situ Soil Moisture Data

The dataset of in-situ cropland soil moisture in China is collected from cropland sites at a temporal resolution of 10 days from 1991 to 2002. The relative soil moisture (unit: %) was provided at five different depths (0–10 cm, 0–20 cm, 0–50 cm, 0–70 cm, and 0–100 cm). The original dataset is obtained from the National Meteorological Information Center of China (http://data.cma.cn/data/). Several of the records from the study period are absent in the original dataset, meaning that preprocessing was required before use. Sites, for which the observation extended over less than 3 years (36 months), were removed; further, because observations were mainly absent at soil depths of 0–70 cm and 0–100 cm, only measurements taken at 0–50 cm were used. The spatial locations of the 29 sites that were selected following pre-processing are shown in Figure 1. The monthly soil moisture data are calculated as the average of the 10-d measurements.

FIGURE 1

FIGURE 1. Basin boundaries and the locations of in-situ soil moisture sites.

4 Methodology

4.1 The TWS Dynamics Model and Experiment Design

The withdrawal of water produces significant amounts of stress on the water security in northern China, and terrestrial water storage dynamics are under the dual influence of climate change and human activities. The basic concept for the model proposed in this study was to learn the underlying patterns of GRACE TWSA that are associated with climatic and human factors using machine learning algorithms. The validated model was then utilized for the period of 1980s to obtain an extended GRACE-consistent TWSA product. This model considers both climatic and human factors. In addition, the variation in the TWS for a particular basin at a specific time is closely associated with the climate conditions and human activities within the months prior. Thus, the prior conditions were included in the model. The following function gives the expression for the model:

T W S A_{(i, t)} = f (X_{(i, t)}, X_{(i, t - 1)}, X_{(i, t - 2)}, X_{(i, t - 3)}, W_{(i, t)}, W_{(i, t - 1)}, W_{(i, t - 2)}, W_{(i, t - 3)}, l a t_{i}, l o n_{i}, a l t_{i}, c l i m_{i}, t) (1)

where $T W S A_{(t, i)}$ means the TWSA of grid cell i for month t (t = 1, 2, 3,...., 12); $X_{(i, t)}$ is an array describing the climatic forcing variables (see Table 1) in grid cell i of month t; $X_{(i, t - 1)}$ , $X_{(i, t - 2)}$ , and $X_{(i, t - 3)}$ are the climatic forcing arrays during the previous one, two, and 3 months, respectively; W is an array describing human water withdrawal (see Table 1) in grid cell i for month t; $W_{(i, t - 1)}$ , $W_{(i, t - 2)}$ , and $W_{(i, t - 3)}$ are the water withdrawal arrays during the previous one, two, and 3 months, respectively; $l a t_{i}, l o n_{i},$ and $a l t_{i}$ are the latitude, longitude, and altitude in grid cell i, respectively, which are included to address the geolocations; $c l i m_{i}$ is the climate zone at grid cell i; and t is the time variable (t = 1, 2, 3,..., 12). All the climatic and human withdrawal variables in Eq. 1 were found to be anomalies after the mean values of the 2004–2009 baseline line were removed, which is in line with the GRACE TWSA data.

The model was designed using two well-known ensemble learning algorithms: random forest (RF) (Breiman, 2001) and extreme gradient boosting (XGB) (Chen and Guestrin, 2016). Ensemble learning algorithms are a type of machine learning mechanism that has been increasingly used for geoscientific applications, showing strong uniqueness and outperforming other machine learning algorithms (Catani et al., 2013; Keller and Evans, 2019; O'Gorman and Dwyer, 2018; Reichstein et al., 2019). The basic concept behind ensemble learning is to combine multiple simple learners to obtain more reliable guides. In addition to their flexibility (no feature normalization is required and simple parameters can be used) and good performance, it is possible to inspect the essence of each variable through variable importance ranking in the RF and XGB models. Successful results have been reported in the evaluation of important variables using the RF algorithm to monitor surface temperature (Hutengs and Vohland, 2016), soil moisture (Long et al., 2019), and groundwater (Rahmati et al., 2016). The random forest (RF) algorithm is widely regarded as one of the best types of ensemble learning algorithms that implements bagging (from bootstrap aggregating). RF uses a bootstrap method that repeatedly draws random subsets from the total training sets with a few replacements. One simple learner (usually a decision tree) is independently generated with each subset (Breiman, 2001), and the final prediction is obtained by averaging the predictions from all the individual trees in the forest.

The XGB model uses the same base learner as the RF model, but a different ensemble strategy is implemented. XGB regression uses a gradient boosting method as an ensemble strategy (Chen and Guestrin, 2016), Gradient boosting is an optimization method that minimizes the residuals by fitting onto the residues forecast by the (i-1)th tree to correct the errors from the predecessors (the ith tree). The performance of the XGB model has not been widely evaluated in earth system modeling, and few studies have compared RF with XGB.

Considering the comparable performance, use of the same base learner, and different ensemble strategies, a comparison of the two ensemble learning models is beneficial for assessing their application in hydrology models. More importantly, because the importance of the variables used in the XGB model can be evaluated using the same approach as RF, interpretation of the variable importance results from the two models is expected to enhance our knowledge of the influence that climate and human factors have on TWS models. Details of the learning process and the variable importance calculation are provided in a previous study (Jing et al., 2020). The models are implemented using Python and the related modules Scikit-learn (Pedregosa et al., 2011) and XGBoost (Chen and Guestrin, 2016).

4.2 Validation Strategy

The evaluation of a model typically relies on cross-validation with randomly sampled subsets. For the ensemble learning models in this study, a cross-validation scheme was used in which the subsets were repeatedly and randomly drawn with replacement for training. Designation is the key point because it improves the stability and accuracy while reducing the likelihood of overfitting. Hence, temporally adjacent years were used for validation in this study instead of cross-validation based on randomly selected subsets.

The validation sets used were temporally adjacent to evaluate the temporal correlations and the capacity of a model to predict a time series for the TWSA. Thus, the period 2003–2015 was divided into two training-validation groups, with the Group 1 (G1) training set covering the period 2006–2015 and validation set covering the period 2003–2005 and the Group 2 (G2) training set describing the period 2003–2011 and validation set covering 2012–2015.

The model was also trained without the inclusion of human factors. The models without human factors are referred to in the following sections (Table 2) as RF(N) and XGB(N), respectively.

TABLE 2

TABLE 2. Arrangement of experimental groups for training and validation.

The validation metrics include the coefficient of determination (R²), correlation coefficient (R), root mean square error (RMSE), and mean error (ME), which are calculated as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} (2)

R = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}} (3)

M E = \frac{\sum_{i = 1}^{n} x_{i} - \sum_{i = 1}^{n} y_{i}}{n} (4)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}{n}} (5)

In Eq. 2, $x_{i}$ is the estimated TWSA value at grid i, $y_{i}$ is the GRACE TWSA in grid i, $\bar{y}$ is the mean of the GRACE TWSA grids, and $R^{2}$ is used to evaluate the variance in the dependent variable that can be predicted from the independent variables. The value of R calculated by Eq. 3 is used to measure the correlations between time series in this study, with $x_{i}$ representing the basin-level averaged estimated TWSA of month i, $y_{i}$ the corresponding GRACE TWSA, and $\bar{x}$ and $\bar{y}$ the corresponding average values of TWSA over time. In Eqs 4, 5, $x_{i}$ and $y_{i}$ represent the estimated TWSA and observed TWSA, respectively.

There is notable seasonality in the TWSA time series, which has a significant impact on the above error metrics. Therefore, the seasonal cycle needs to be removed from the TWSA time series before the R, ME, and RMSE between the different sources are calculated. Periodic trend decomposition using local regression (STL) is used to remove the periodic cycle. The STL approach proposed by Cleveland et al. (1990). has been increasingly reported as a versatile and moderate method for time-series decomposition (Lu, 2003; Scanlon et al., 2018). The TWSA can be decomposed into three components:

S_{t o t a l} = S_{l o n g - t e r m} + S_{s e a s o n a l} + S_{r e s i d u a l} (6)

where $S_{t o t a l}$ is the original TWSA time series, $S_{l o n g - t e r m}$ is the long-term trend in the TWSA time series, $S_{s e a s o n a l}$ is the periodic variation, and $S_{r e s i d u a l}$ is the residual. Readers are also referred to (Cleveland et al., 1990) for further details of the STL approach. The root mean square of the residuals ( $S_{r e s i d u a l}$ ) is used to approximate errors in the measurement following the method in (Scanlon et al., 2018):

M e a s E r r = \sqrt{\frac{\sum_{i = 1}^{n} r e s_{i}}{n}} (7)

where $M e a s E r r$ is the measurement error and $r e s_{i}$ is the residual sequence ( $S_{r e s i d u a l}$ ) in Eq. 6.

5 Results

5.1 Model Performance

5.1.1 Spatial Patterns of TWSA

Figure 2 present the spatial patterns in the estimated TWSA from the RF and XGB models, respectively, for September 2013 which was accompanied by the TWSA form the GRACE CSR-M solution and the global models. The TWSAs produced by the machine learning models (Figures 2B,D) are consistent with the spatial patterns of the TWSAs produced by GRACE. The TWSA without human withdrawal (Figure 2C,E) are less negative than those from the GRACE solutions and the results that include human factors, indicating the significance of human factors in the TWS dynamic model of the basins. The spatial patterns of the TWSA from the global models differ from those produced by GRACE. The TWSA is more negative in catchment than in GRACE for the Huaihe River Basin, while PCR-GLOBWB TWSA is more negative than GRACE in the Haihe River Basin. PCR-GLOBWB without human intervention (PCR-GLOBWB(N)) failed to capture the most negative TWSA grids compared to models including human factors.

FIGURE 2

FIGURE 2. Spatial patterns of TWSA for (A) GRACE, (B) RF, (C) RF(N), (D) XGB, (E) XGB(N), (F) Catchment, (G) Noah, (H) PCR-GLOBWB, and (I) PCR-GLOBWB(N) in September 2013 (Unit: cm).

Figure 3 displays the spatial pattern of the RMSE from each model compared with the GRACE solution for the validation set. The LSMs and the PCR-GLOBWB model generally have a higher RMSE compared with GRACE TWSA than the machine learning models, especially in terms of most of the lower Yellow River, Huaihe River, and Haihe River Basins. The inclusion of human withdrawal is beneficial to the machine learning models because a higher RMSE can be seen in the RF(N)/XGB(N) models, especially for G2. The PCR-GLOBWB model with human factors produced a higher RMSE over the Haihe River Basin, the lower stream of Yellow River Basin, and the Hetao Plain than the PCR-GLOBWB(N) model, indicating that the PCR-GLOBWB model overestimated the impact of human withdrawal on the decline in TWS in the Haihe River Basin.

FIGURE 3

FIGURE 3. Spatial patterns and in the root mean square error (RMSE) of TWSA of GRACE observations and the different models in G1: (A) RF, (B) RF(N), (C) XGB, (D) XGB(N), (E) Catchment, (F) Noah, (G) PCR-GLOBWB, and (H) PCR-GLOBWB(N); and spatial patterns and in the RMSE of TWSA of GRACE observations and the different models in G2: (I) RF, (J) RF(N), (K) XGB, (L) XGB(N), (M) Catchment, (N) Noah, (O) PCR-GLOBWB, and (P) PCR-GLOBWB(N).

Figure 4 together show grid-by-grid scatter plots describing the results of the GRACE TWSA (x-axis) and the TWSA patterns from the machine learning tools and global models (y-axis). Figure 4 displays the validation sets in G1 and G2. According to the results, the machine learning models that included human factors generally outperformed thosethat did not. The RF and XGB models obtained consistent results, and both outperformed the global models examined in this study. Specifically, the R2 values between RF TWSA and GRACE TWSA are 0.64 (G1) and 0.75 (G2), the RMSE is 2.67 cm for G1 and 3.43 cm for G2, and the MEs are all within ±1.0 cm. The results for the XGB TWSA are similar to those of the RF model (Figures 4E,F). The RMSEs of the machine learning model results are close to the claimed uncertainty value of the GRACE CSR-M solution (approximately 2 cm globally), indicating the quality of the results predicted by the machine learning models. The TWSA of the global models, in contrast, demonstrates a lower R2 and a higher RMSE than those from the machine learning models against the GRACE solution.

FIGURE 4

FIGURE 4. Scatter plots between the TWSA of the GRACE observations and different models showing the validation sets for G1: (A) RF, (B) RF(N), (C) XGB, (D) XGB(N), (E) Catchment, (F) Noah, (G) PCR-GLOBWB, and (H) PCR-GLOBWB(N); and scatter plots between the TWSA of the GRACE observations and different models showing the validation sets for G2: (I) RF, (J) RF(N), (K) XGB, (L) XGB(N), (M) Catchment, (N) Noah, (O) PCR-GLOBWB, and (P) PCR-GLOBWB(N).

Although the machine learning models obtained similar R2 values for the two groups in the validation set, the RMSEs in G2 were higher than those in G1. In addition, the discrepancies between the RF/XGB model and the RF(N)/XGB(N) model are more notable in G2 than in G1. The RF/XGB models that include human factors greatly reduce the estimation errors in G2 compared with those that did not include human factors (Figures 4D,H). In G2, the MEs of RF(N) and XGB(N) increase to 1.93 and 1.89 cm, respectively, which are much higher than those produced by the RF and XGB models, suggesting that machine learning models tend to overestimate TWSA values compared to the GRACE solution when human factors are not included and the differences are more evident in G2 than G1.

5.1.2 Temporal Behaviors of Basin Level TWSA

Figure 5 shows a comparison of the basin-level TWSA time variation series from GRACE, the machine learning models, and the global models considered in this study. The TWSA estimated by the land surface models is more positive than the GRACE TWSA time series. Meanwhile, the TWSA estimated using the machine learning models shows good correlation with the GRACE TWSA time series, while the model without human factors also produces a marginally more positive TWSA than the GRACE solution, especially for G2.

FIGURE 5

FIGURE 5. Comparison of the Basin level TWSA time series of GRACE observations, different models of the Yellow River Basin in (A) G1, (B) G2, the Huaihe River Basin in (C) G1 and (D) G2, and the Haihe River Basin in (E) G1 and (F) G2 (Red background color indicates the validation period).

According to Figures 5E,F, the PCR-GLOBWB model overestimate the decrease in the TWSA in the Haihe River Basin relative to the GRACE solution. This is consistent with Figures 2H, 4, which implies a more severely negative TWSA of PCR-GLOBWB than that produced by the GRACE solution. Such overestimation of the TWS decline in the Haihe River Basin is partly related to the overestimation of human intervention in the basin because the PCR-GLOBWB(N) model produced more positive and less negative TWSAs for the basin (Figures 5E,F).

Figure 6 present the correlation coefficient (R), RMSE, and ME between the basin-level TWSA from GRACE and the machine learning tools, as well as the globally used models examined in this study. To reduce the influence of seasonal variation, the seasonal components were removed before the values of R, RMSE, and ME were calculated. The RF and XGB TWSA time series for the training set perfectly reproduced the GRACE TWSA time series over the three basins, with a high R (>0.8) and a low RMSE (<2.0 cm). The R for RF/XGB in the G1 validation set was generally higher than that of RF(N)/XGB(N) in the same group. Although similar values of R were obtained for G2 by RF/XGB and RF(N)/XGB(N), the RMSE and ME of the RF(N)/XGB(N) models were much lower than those of RF/XGB. In general, the RF/XGB models outperform RF(N)/XGB(N) with a higher value of R and a lower RSME and ME. Compared with the LSMs and PCR-GLOBWB models, the TWSA estimated using machine learning models is much closer to the GRACE solution, with lower values of RMSE and ME. However, the correlation coefficients of all the machine learning models were lower for the validation set than for the training set. There are two possible reasons for this, the first is that only thirty-six samples (12 months × 3 years) were available for calculating the correlation coefficients of the basin-level TWSA time series in the validation set, and this number of samples may be too small to obtain a valid correlation estimation (Bonett and Wright, 2000); Another reason might be the machine learning models usually perform better in the training set than in the validation set, as the models are trained based on the data in the training set.

FIGURE 6

FIGURE 6. (A) Correlation coefficients (R), (B) RMSE, and (C) ME between the de-seasonalized TWSA produced by GRACE observation and the models for the Yellow River Basin from the validation set in G1 and G2; (D) The correlation coefficients (R), (E) RMSE, and (F) ME of the de-seasonalized TWSA of GRACE observation and the unique models for the Huaihe River Basin from validation sets in G1 and G2; (G) The correlation coefficients (R), (H) RMSE, and (I) ME of the de-seasonalized TWSA of GRACE observation and the unique models for the Haihe River Basin from the validation sets in G1 and G2.

5.2 Reconstruction of Past TWSA

The models were trained using datasets covering the entire period of 2003–2015 with human factors and then applied to the period 1979–2015 to generate a value for TWSA that reaches back to 1979. The entire period was divided into three subperiods for analysis: 1979–1990 (P1), 1991–2002 (P2), and 2003–2015 (P3). Figure 7 displays the average TWSA series from the RF and XGB models for the three basins and the TWSA from catchment, Noah, and PCR-GLOBWB. The three subperiods are identified with different panel background colors in the figure. The RF and XGB models produced similar TWSA estimations; therefore, Figure 8 only presents the coefficients of correlation and MEs among the RF TWSA and the three global models after seasonal variations and residuals are removed.

FIGURE 7

FIGURE 7. Basin level de-seasonalized TWSA time series for the period 1979–2015 from the RF, XGB, Catchment, Noah, and PCR-GLOBWB models in (A) Yellow River Basin, (B) Huaihe River Basin, and (C) Haihe River Basin (P1, P2, and P3 are highlighted by different background colors).

FIGURE 8

FIGURE 8. (A) Correlation coefficient (R), (B) mean error (ME) between RF TWSA, the TWSA of the Catchment, Noah, and PCR-GLOBWB models over the three sub-periods in the Yellow River Basin; (C) the correlation coefficient (R) and (D) mean error (ME) among he RF TWSA and the TWSA of the Catchment, Noah, and PCR-GLOBWB models for the three sub-periods in Huaihe River Basin; (E) The correlation coefficient (R) and (F) mean error (ME) between the RF TWSA and the TWSA of the Catchment, Noah, and PCR-GLOBWB models for the three sub-periods in Haihe River Basin.

In general, the TWSA reconstructed using machine learning models is well correlated with the TWSA of the LSMs (catchment and Noah) throughout the three subperiods, except for P2 in the Huaihe River Basin (Figure 8A,C,E). The TWSA from the PCR-GLOBWB models shows higher correlation with the reconstructed TWSA but low correlation and high MEs for P1 and P2, especially over the Huaihe and Haihe River Basins (Figure 7). This can also be seen in Figure 8. In addition, the MEs between the RF TWSA and the TWSA from the global model were consistent for the three sub-periods. The measurement errors (RMS of the residuals from STL analysis) of RF and XGB TWSA in the three subperiods were consistent (Table 3). Therefore, the quality of the RF and XGB TWSA for the past few decades is generally stable and in line with the GRACE TWSA for the period 2003–2015.

TABLE 3

TABLE 3. The root mean square (RMS) of residuals (from STL analysis) for TWSA of different models and GRACE observations during the three sub-periods.

Figure 9 shows a comparison between the RF and XGB TWSAs and the in-situ soil moisture (0–50 cm) at 29 sites in the study area during the period 1991–2002. The correlation coefficients between the two models are shown in Figure 10. Because several of the values for in-situ soil moisture are missing, the number of records available for each site is also shown for reference. According to Figure 10, the variation in the reconstructed TWSA is well correlated with the in-situ soil moisture at most of the sites assessed, and the overall variations in the trends are concurrent. The R value at each site ranged between 0.2 and 0.7, and the average R for all sites was approximately 0.45 (Figures 10B,C). R is lower at all sites without adequate records, such as Nos 53980 and 54705. However, the R value also seems lower than 0.3 at some of the sites with more than 80 records, such as No 53783 and No 57089. The intercorrelations between TWS variations and subsurface water contents have quite a lot to do with the root zone soil moisture and the groundwater levels (Rodell et al., 2009; Tian et al., 2019), and the groundwater amount occupies the majority of the TWS. Although the variations in the sampled soil moisture partially indicate the changes in the regional water storage, the connection between the two parameters looks much weaker than that between groundwater and TWS. In general, the TWSA reconstructed using machine learning tools generally has good agreement with the in situ soil moisture at most of the sites examined in this study. Nevertheless, because a comparison was conducted using soil moisture measurements at 0–50 cm, which only indicates the variations in the water content of the shallow soil layer, the explanation of the results is regarded as indirect rather than validated.

FIGURE 9

FIGURE 9. The Comparison of TWSA of the RF and XGB models with in-situ soil moisture measurements during the period 1991–2002.

FIGURE 10

FIGURE 10. (A) Correlation coefficients between RF/XGB TWSA and the in-situ soil moisture on each site, (B) boxplot of the correlation coefficients at all sites, and (C) boxplot showing the of number of records at all sites.

5.3 Variables Importance Contribution

The relative contribution of each variable to the RF and XGB models were also quantified. Figures 11A,B plot the variable importance value (scaled to 0–100) for each variable in the RF and XGB models. The precipitation for the previous one to 3 months (PREt-1, PREt-2, and PREt-3) significantly contributed to the results of both the RF and XGB models. This is in line with the lag in the correlation between variation in the TWS and precipitation. Precipitation takes some time to infiltrate into the root zone soil, after which some of it becomes groundwater, accounting for the majority of the TWS (Sala et al., 1992; Milly, 1994; Andrew R. L. et al., 2017). Thus, the contribution from the precipitation occurring in previous months is rationalized, and the machine learning models effectively learned the close relation between the TWSA and the variation in the precipitation of the study area. The geolocations (LAT and LON) are equally as important as precipitation, suggesting a significant spatial variation in the TWSA of the basin. Latitude is much more important than precipitation in the XGB model.

FIGURE 11

FIGURE 11. (A) Variable importance values for the RF model at each site, (B) variable significancevalues of the XGB model at each site; summary of variable significance values for the RF model in (C) G1 and (D) G2, and summary of variable significance values for the XGB model in (E) G1 and (F) G2.

Figure 11C–F summarizes the variable importance of the climate forcing, human withdrawal, latitude, longitude, elevation, climate zone, and time variables. In the RF model, the contribution of climate forcing accounts for 66.3–70.8%, and the contribution of the human factor is 17.6–14.4%. The contribution from climate forcing is lower (39.4–50.0%), and the contribution of human factors increases (25.3–21.6%) in the XGB model. Figure 12 illustrates the percentage of each individual variable in terms of climate forcing and human activity. Precipitation dominates the contribution of climate forcing (∼30%) to the models. In the case of human activity, the contribution of nonpaddy irrigation (NPIR) is higher than that of paddy irrigation (PIR) because nonpaddy crops (wheat and maize) are major crops in the Yellow River Basin and Haihe River Basin.

FIGURE 12

FIGURE 12. (A) Percentage summary of the variable importance of climate variables, and (B) Percentage summary of the variable importance of sectoral water use.

Generally, the contributions of water withdrawals for domestic, industrial, and irrigation use in the models are equal, and the importance value of livestock withdrawal (LIST) is lower than that of the other sectors. Given the dominance of the plains, crop-friendly temperatures, and hours of sunshine, the middle-lower Yellow River basin, Haihe River basin, and Huaihe River basin are considered major bases for the production of grain in China. The irrigation in these areas depends heavily on withdrawal from the groundwater, main rivers, and tributaries (Zheng et al., 2009; Zhao et al., 2014). Coal and iron industries dominate in the middle-lower Yellow River Basin provinces (Ningxia, Shaanxi, and Shanxi provinces) and the Haihe River Basin (Hebei province) (Zhong et al., 2016; Shang et al., 2017). According to Stats, the regional population of the Huang-Huai-Hai River Basin accounts for approximately 35% of the total population of China (Yuan et al., 2019), which gives rise to a huge demand for domestic water. Therefore, the importance contributions of irrigation, industry, and domestic requirements can be expected to be equal, and these patterns were learned well by the machine learning models.

6 Discussion and Conclusion

This study proposed a machine learning model for estimating historical variations in the TWS in the Huang-Huai-Hai River Basin. Random forest (RF) and Extreme Gradient Boost (XGB) models were used to train a model that linked GRACE TWSA observations with climate data and human withdrawal. The past GRACE-consistent TWSA was reconstructed by applying this model. The reconstructed TWSA results are compared with the global land surface area and hydrological models, as well as the in situ soil moisture measurements. In addition, the contribution of every variable to the machine learning tool/model was quantified.

In general, the presented approach reproduces spatial patterns and temporal variations in the GRACE TWSA, thereby outperforming a set of global models. The inclusion of human intervention improved the performance of the machine learning model. This is rational because groundwater has been overexploited since the 1970s in the plains of North China, and human activities have profound effects on the changes in the TWS of this region (Kendy et al., 2004; Shi et al., 2011; Cao et al., 2013; Huang et al., 2015; Min et al., 2015). The catchment and Noah models, however, cannot consider human intervention and underestimate the decrease in TWS relative to the GRACE TWSA produced for the research area. These findings are in line with previous research (Mo et al., 2016; Scanlon et al., 2018). The catchment model includes groundwater simulation, enriching the terrestrial water storage components compared with the Noah model. However, the groundwater in the catchment is not directly modeled, and the vertical distribution equilibrium of the soil moisture comprises an implicit water table that is located at the depth of the equilibrium saturation (Kumar et al., 2016). Therefore, there are still many differences between the TWS simulations performed using the catchment model and the GRACE solution.

The PCR-GLOBWB model comprises all the terrestrial water storage models, and because human activities are considered, the TWSA from PCR-GLOBWB presents higher correlations with GRACE TWSA at the basin level. However, the declining trend in the TWS of the Haihe River Basin was overestimated by PCR-GLOBWB relative to the GRACE solution. A similar overestimation of the decline in TWS, which was produced by the PCR-GLOBWB model in this region, was also revealed in another study (Feng et al., 2018). The PCR-GLOBWB model without human intervention underestimated the decrease in the TWS of the Haihe River Basin, indicating that the overestimation of the decline in the TWS by PCR-GLOBWB is a result of the overestimation of human water use in the basin. The machine-learning model used the same calculations that were used for describing water withdrawals in the PCR-GLOBWB model and produced consistent results with the GRACE solution. This indicates that the pattern of climatic and human withdrawals produced by the GRACE TWSA has been successfully calibrated and predicted from the machine learning tool for both the present and past records.

The contribution importance estimations provide an intuitive perspective from which to interpret the ensemble learning models used to estimate TWS changes in this study. Precipitation, acknowledged as the major climatic driving force of variations in the TWS in previous studies (Li et al., 2017; Meng et al., 2019; Xie et al., 2019), is the climatic variable with the highest significance value in the present model. The use of water by three sectors (irrigation, domestic, and industry) has equally important contributions in the model, though not livestock. Some individual variables are more important in the models. In theory, the models consider that only the most important variables would produce comparable results with models that include a full set of variables. For this reason, the RF model is used to select important features describing specific issues (Ham et al., 2005; Chen et al., 2014; Ma et al., 2017). Thus, additional experiments are required in the future to investigate the performance of the models with variables of different sizes.

The variable importance rankings derived from the RF and XGB models are generally similar, except that the human factors and geolocation obtained higher importance values in the XGB model than in the RF model. A possible explanation for this is that the geolocation and human factor contribute more to the residuals of the base learners because the simple learners recursively fit the residuals of their predecessors in the boosting-based ensemble learning model (Freund and Schapire, 1996; Friedman, 2002).

Not every tree in the RF and XGB models includes all the characteristics or observations, which guarantees that the trees are decorrelated and therefore less prone to overfitting. However, correlated features will be issued equal or similar importance, which may reduce the significance compared to the same trees built without their correlated counterparts. In addition, it should be mentioned that the significance value is merely a statistical relative score that does not indicate the authentic contribution of a variable to the TWS, although the RF and XGB models are produced by random selection. The reliability of the results in terms of ranking importance relies on the performance of a model and the selection of the variables used. The notable variables used in this study probably do not cover all types of human intervention or the hydrological factors affecting the TWS dynamics within the headwater of the Yellow River, the Loess Plateau, and the surrounding areas, such as the Three Rivers Source Region Reserve, the Grain for Green project, reservoir operation, and the contribution from snowmelt in the surrounding areas (Jin et al., 2017; Yi et al., 2017; Deng et al., 2018; Lv et al., 2019; Meng et al., 2019; Xie et al., 2019). Therefore, the variable importance results should be explained with additional caution.

The era of big data and artificial intelligence is accelerating the development of data-driven Earth system science. Many machine learning techniques have provided tools and exciting new opportunities for accurately predicting the evolution of water cycles and expanding our understanding of the Earth system from multisource data. We can learn from the results of this study that machine learning and artificial intelligence have the potential to elucidate much more from data than traditional land surface and hydrology models can. However, respecting nature’s physical law is important for expanding our knowledge of the Earth by using machine learning. Therefore, the following key challenge is to develop a hybrid model for coupling physical process models and machine learning approaches.

Because the model input data is not exactly the same as the ensemble learning model, in addition, the GLDAS model also contains input data from other sources, such as land use, soil type and texture, which can cause uncertainty in the GLDAS model. The biases caused by these uncertainties cannot be quantitatively analyzed and discussed in this paper, and the comparison results of GLDAS land surface models need to be further studied.

In summary, the models proposed in this study address how both climate and human factors impact the dynamics of TWS while performing as well as or better than a set of global models in terms of describing areas with intense human activities. Instead of modifying the physical models, the machine learning model is a more flexible, and less expensive alternative for directly reconstructing past changes in the total TWS. The findings of this study enrich our conceptions of the changes and driving forces in the TWS over the Huang-Huai-Hai River Basin and have great potential to be assimilated into hydrological models to improve TWS simulations in the future.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

JY and YL conceived and designed the experiments; JY performed the experiments; SC analyzed the data.

Funding

This study was jointly supported by the National Natural Science Foundation of China (41801362, 41976190), the Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou) (GML2019ZD0301), Guangdong Innovative and Entrepreneurial Research Team Program (2016ZT06D336), GDAS' Project of Science and Technology Development(2019GDASYL-0301001, 2020GDASYL-20200104003), The Science and Technology Program of Guangdong (2021B1212100006). We also thank the Geographical Science Data Center of The Greater Bay Area for providing the relevant data in this study.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Allen, R. G., Pereira, L. S., Raes, D., and Smith, M. (1998). Crop Evapotranspiration-Guidelines for Computing Crop Water Requirements-FAO Irrigation and Drainage Paper 56. Fao, Rome 300 (9), D05109.

Google Scholar

Alsdorf, D. E., Rodríguez, E., and Lettenmaier, D. P. (2007). Measuring Surface Water from Space. Rev. Geophys. 45 (2), 2002–2004. doi:10.1029/2006rg000197

Reconstructing the Historical Terrestrial Water Storage Variations in the Huang–Huai–Hai River Basin With Consideration of Water Withdrawals

1 Introduction

2 Research Area

3 Data Sources

3.1 Model Input Data

3.1.1 CSR GRACE Mascon Solution

3.1.2 The Climate Forcing Data

3.1.3 Water Withdrawals

3.1.4 Auxiliary Data

3.2 Data Used for Comparison

3.2.1 Global Models

3.2.2 The In-Situ Soil Moisture Data

4 Methodology

4.1 The TWS Dynamics Model and Experiment Design

4.2 Validation Strategy

5 Results

5.1 Model Performance

5.1.1 Spatial Patterns of TWSA

5.1.2 Temporal Behaviors of Basin Level TWSA

5.2 Reconstruction of Past TWSA

5.3 Variables Importance Contribution

6 Discussion and Conclusion

Data availability statement

Author Contributions

Funding

Conflict of Interest

Publisher’s Note

References

95% of researchers rate our articles as excellent or good