Can normalized difference vegetation index and climate data be used to estimate soil carbon, nitrogen, and phosphorus and their ratios in the Xizang grasslands?

Wang, Shaohua; Qi, Huxiao; Li, Tianyu; Qin, Yong; Fu, Gang; Pan, Xu; Zha, Xinjie

doi:10.3389/feart.2023.1340020

ORIGINAL RESEARCH article

Front. Earth Sci. , 02 February 2024

Sec. Biogeoscience

Volume 11 - 2023 | https://doi.org/10.3389/feart.2023.1340020

Can normalized difference vegetation index and climate data be used to estimate soil carbon, nitrogen, and phosphorus and their ratios in the Xizang grasslands?

Shaohua Wang¹

Huxiao Qi²

Tianyu Li²

Yong Qin²

Gang Fu²*

Xu Pan³

Xinjie Zha⁴

¹Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
²Lhasa Plateau Ecosystem Research Station, Key Laboratory of Ecosystem Network Observation and Modeling, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China
³Wetland Research Center, Institute of Ecological Conservation and Restoration, Chinese Academy of Forestry, Beijing, China
⁴Xi’an University of Finance and Economics, Xi’an, China

Accurately quantifying the relative effects of climate change and human activities on soil carbon, nitrogen, and phosphorus in alpine grasslands and their feedback is an important aspect of global change, and high-precision models are the key to solving this scientific problem with high quality. Therefore, nine models, the random forest model (RFM), generalized boosted regression model (GBRM), multiple linear regression model (MLRM), support vector machine model (SVMM), recursive regression tree model (RRTM), artificial neural network model (ANNM), generalized linear regression model (GLMR), conditional inference tree model (CITM), and eXtreme gradient boosting model (eXGBM), were used for modeling soil organic carbon (SOC), total nitrogen (TN), total phosphorus (TP), the ratio of SOC to TN (C:N), the ratio of SOC to TP (C:P), and the ratio of TN to TP (N:P) at depths of 0–10, 10–20, and 20–30 cm under non-grazing and free-grazing scenarios in the Xizang grasslands. Annual radiation (ARad), annual precipitation (AP), and annual temperature (AT) were used as independent variables under non-grazing scenarios, whereas ARad, AP, AT, and growing season maximum normalized difference vegetation index (NDVI_max) were used as independent variables under free-grazing scenarios. Overall, the RFM and GBRM were more accurate than the other seven models. However, the tree numbers of the GBRM were much larger than those of the RFM, indicating that the GBRM may have a greater model complexity and lower running speed. Therefore, the RFM had the best performance among the nine models in modeling SOC, TN, TP, C:N, C:P, and N:P in the Xizang grasslands. The RFM established in this study can not only help scientists save time and money on massive sampling and analysis, but can also be used to construct a database of SOC, TN, and TP, and their ratios, and further scientific research related to ecological and environmental issues (e.g., examining whether soil systems intensified global warming over the past few decades by exploring whether climate change and human activities altered soil organic carbon) in the grasslands of Xizang Plateau.

1 Introduction

Soils not only provide nutrients such as nitrogen and phosphorus for various terrestrial plants but also contain a large amount of organic carbon (Zhao et al., 2022; Niu and Fu, 2024), which is an important carbon reservoir for regulating global warming. Soil organic carbon (SOC), total nitrogen (TN), and total phosphorus (TP), and their ratios (i.e., ratio of SOC to TN, C:N; ratio of SOC to TP, C:P; and ratio of TN to TP, N:P), are important soil variables (Yang et al., 2007; Zhang et al., 2007; Feng et al., 2017; Liu et al., 2019; Kou et al., 2020; Li et al., 2020; Li et al., 2021). At least one of SOC, TN, TP, C:N, C:P, and N:P is closely correlated with soil respiration (Yu et al., 2019b; Fu and Shen, 2022), N₂O flux (Zhang et al., 2020; Bahram et al., 2022), forage nutrition quality and production (Zha et al., 2022; Han et al., 2023b), soil microbial diversity, and plant diversity (Yu et al., 2019a; Wang et al., 2021b; Zhang and Fu, 2021). Therefore, modeling SOC, TN, TP, C:N, C:P, and N:P on multiple space-time scales is an important challenge for global carbon, nitrogen, and phosphorus cycling. To obtain these six soil variables, predecessors have acquired various methods, including direct observation methods (Fu et al., 2012; Han et al., 2023b), spatial interpolated methods (de Melo et al., 2016; Shit et al., 2016), process models (e.g., CENTURY model) (Mikhailova et al., 2000; Tornquist et al., 2009; Zhuang et al., 2010), and machine learning models (e.g., random forest model and eXtreme gradient boosting model) (Bangelesa et al., 2020; Marcal et al., 2021). These previous studies can provide very important guidance for our current and future research but they have the following two deficiencies. First, of the numerous existing machine learning methods (Reichstein et al., 2019; Hutson, 2022), there is still controversy over which is best at quantifying soil carbon, nitrogen, and phosphorus and their ratios (Bangelesa et al., 2020; Wang et al., 2021a). Second, there are always a lot of model input variables for many of these previous studies (Wang et al., 2020). However, the performance of the model does not always increase with the increase of input variables due to the data accuracy of input variables themselves and other potential reasons (Dai et al., 2023; Wang and Fu, 2023). The number of input variables also affects the speed of the model, increasing costs such as electricity (Tian and Fu, 2022; Wang and Fu, 2023). In addition, any input variable depends on raw observation data, which incurs labor costs and requires resources (Tian and Fu, 2022; Dai et al., 2023). By contrast, models based on a smaller number of input variables but with a higher accuracy may be the potentially optimal model, although this requires further proof. Therefore, further research about finding the best models of the six soil variables is necessary.

Alpine soils on the Xizang Plateau are not only important components of global alpine soils but also a very sensitive reservoir of soil carbon under the background of global change. Accurate quantification of soil carbon is an important basis for the accurate quantification of the carbon sink function of terrestrial ecosystems under the background of global change on the Xizang Plateau, and a highly accurate model is an important basis for the accurate quantification of the soil carbon pool. With the development of machine learning techniques, an increasing number of tools are available for us to choose for the modeling of environmental variables (Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023). Under such a scenario, some studies have tried to find the best models of α-diversity, forage nutrition quality and storage, soil moisture, and pH on the Xizang Plateau from a variety of machine learning methods (Han et al., 2022; Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023). These studies have confirmed that the random forest model (RFM) can be the best method (Han et al., 2022; Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023). However, whether the RFM is better than other methods at quantifying SOC, TN, TP, C:N, C:P, and N:P for grasslands on the Xizang Plateau is unclear. On the other hand, although it is well known that climate change and human activities jointly affect grassland ecological systems (e.g., SOC) (Ganjurjav et al., 2015; Wang et al., 2022), it is still controversial whether climate change or human activities play a dominant role in the change of grassland ecosystems on the Xizang Plateau. To separate the relative impacts of climate change and human activities on SOC, TN, TP, C:N, C:P, and N:P, it is necessary to construct the six variable models driven by pure climate variables and models jointly affected by climate change and human activities, respectively. Therefore, further studies are needed to quantify the six variables by choosing the best machine learning method in grassland areas on the Xizang Plateau.

In this study, we wanted to find the best models of SOC, TN, TP, C:N, C:P, and N:P at three depths (0–10 cm, 10–20 cm, and 20–30 cm) using only three or four input variables under free-grazing or non-grazing scenarios in the Xizang grassland ecological systems from nine models. The nine models included the RFM, generalized boosted regression model (GBRM), support vector machine model (SVM), multiple linear regression model (MLRM), recursive regression tree model (RRTM), artificial neural network model (ANNM), generalized linear regression model (GLRM), conditional inference tree model (CITM), and eXtreme gradient boosting model (eXGBM) (Han et al., 2022; Wang and Fu, 2023). Previous studies have confirmed that the RFM more accurately models α-diversity and forage nutrition quality, soil pH, and soil moisture in the Xizang grasslands (Han et al., 2022; Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023). Therefore, we hypothesized that the RFM will more accurately model SOC, TN, and TP, and their ratios, in this study.

2 Materials and methods

The study area was located in the grassland areas of the Xizang (Supplementary Figure S1). From west to east, there is a successive distribution of alpine desert steppes, alpine steppes, and alpine meadows (Supplementary Figure S1). All soils were obtained using a soil auger, immediately put in the car refrigerator, and were then transferred to a lab freezer (−20°C) for storage before soil analyses. Soil samples at a depth of 0–10 cm (elevation, 4,279 m to 5,261 m; longitude, 79.46°E to 92.01°E; and latitude 29.28°N to 33.23°N) under non-grazing scenarios were collected in 2011 and 2013–2020. Soil samples at a depth of 10–20 cm (elevation, 4,279 m to 5,261 m; longitude, 79.46°E to 92.01°E; and latitude 29.28 °N to 33.23°N) under non-grazing scenarios were collected in 2011, 2013, 2015, 2017–2018, and 2020. Soil samples at a depth of 20–30 cm (elevation, 4,279 m to 5,261 m; longitude, 84.82°E to 91.07°E; and latitude 29.28 °N to 32.00°N) under non-grazing scenarios were collected in 2011 and 2020. Soil samples at a depth of 0–10 cm (elevation, 2,785 m to 5,330 m; longitude, 79.46°E to 95.68°E; and latitude 28.37°N to 33.23°N) under free-grazing scenarios were collected in 2013 and 2015–2020. Soil samples at a depth of 10–20 cm (elevation, 2,785 m to 5,330 m; longitude: 79.46°E to 95.68°E; and latitude 28.37°N to 33.23°N) under free-grazing scenarios were collected in 2013, 2015, and 2017–2020. Soil samples at a depth of 20–30 cm (elevation, 4,300 m to 5,330 m; longitude, 84.82°E to 91.07°E; and latitude 29.22°N to 32.15°N) under free-grazing scenarios were collected in 2019–2020.

We measured soil organic carbon (SOC), total nitrogen (TN), and total phosphorus (TP) based on the soil samples mentioned above, and then calculated the ratio of SOC to TN (C:N), ratio of SOC to TP (C:P), and ratio of TN to TP (N:P) (Sun et al., 2021; Zha et al., 2022). The potassium dichromate method, Kjeldahl method, and molybdenum antimony resistance colorimetry method were used to analyze SOC, TN, and TP, respectively (Sun et al., 2021). The SOC under non-grazing scenarios at 0–10, 10–20, and 20–30 cm was labeled by SOC_{p_0–10}, SOC_{p_10–20}, and SOC_{p_20–30}, but the SOC under free-grazing scenarios at 0–10, 10–20, and 20–30 cm was labeled by SOC_{a_0–10}, SOC_{a_10–20}, and SOC_{a_20–30}, respectively. Similarly, we labeled TN_{p_0–10}, TN_{p_10–20}, TN_{p_20–30}, TN_{a_0–10}, TN_{a_10–20}, TN_{a_20–30}, TP_{p_0–10}, TP_{p_10–20}, TP_{p_20–30}, TP_{a_0–10}, TP_{a_10–20}, TP_{a_20–30}, C:N_{p_0–10}, C:N_{p_10–20}, C:N_{p_20–30}, C:N_{a_0–10}, C:N_{a_10–20}, C:N_{a_20–30}, C:P_{p_0–10}, C:P_{p_10–20}, C:P_{p_20–30}, C:P_{a_0–10}, C:P_{a_10–20}, C:P_{a_20–30}, N:P_{p_0–10}, N:P_{p_10–20}, N:P_{p_20–30}, N:P_{a_0–10}, N:P_{a_10–20}, and N:P_{a_20–30}.

The observed SOC_{a_0–10}, TN_{a_0–10}, TP_{a_0–10}, C:N_{a_0–10}, C:P_{a_0–10}, and N:P_{a_0–10} was 1.27–206.28 g kg⁻¹, 0.10–10.30 g kg⁻¹, 0.11–5.44 g kg⁻¹, 4.37–21.30, 2.75–262.65, and 0.34–13.46, respectively. The observed SOC_{a_10–20}, TN_{a_10–20}, TP_{a_10–20}, C:N_{a_10–20}, C:P_{a_10–20}, and N:P_{a_10–20} was 1.41–117.09 g kg⁻¹, 0.13–8.87 g kg⁻¹, 0.14–4.27 g kg⁻¹, 0.66–22.08, 3.18–249.60, and 0.38–16.69, respectively. The observed SOC_{a_20–30}, TN_{a_20–30}, TP_{a_20–30}, C:N_{a_20–30}, C:P_{a_20–30}, and N:P_{a_20–30} was 1.51–58.47 g kg⁻¹, 0.27–4.92 g kg⁻¹, 0.18–0.96 g kg⁻¹, 3.61–16.42, 2.98–158.50, and 0.68–10.33, respectively. The observed SOC_{p_0–10}, TN_{p_0–10}, TP_{p_0–10}, C:N_{p_0–10}, C:P_{p_0–10}, and N:P_{p_0–10} was 1.89–80.23 g kg⁻¹, 0.30–5.57 g kg⁻¹, 0.18–0.75 g kg⁻¹, 3.47–21.37, 6.16–147.40, and 0.98–11.39, respectively. The observed SOC_{p_10–20}, TN_{p_10–20}, TP_{p_10–20}, C:N_{p_10–20}, C:P_{p_10–20}, and N:P_{p_10–20} was 1.76–43.59 g kg⁻¹, 0.37–4.20 g kg⁻¹, 0.15–0.85 g kg⁻¹, 2.60–14.83, 4.86–152.20, and 1.02–11.44, respectively. The observed SOC_{p_20–30}, TN_{p_20–30}, TP_{p_20–30}, C:N_{p_20–30}, C:P_{p_20–30}, and N:P_{p_20–30} was 1.16–32.74 g kg⁻¹, 0.34–3.32 g kg⁻¹, 0.11–0.85 g kg⁻¹, 2.04–13.13, 3.18–56.23, and 1.00–7.50, respectively. Referring to previous studies (Dai et al., 2023; Wang and Fu, 2023), these soil carbon, nitrogen, and phosphorus datasets were randomly divided into two parts using the sample function of the R software, one of which contains 30 observations for model accuracy testing and the remaining observations for model construction.

Previous studies found that only three climate variables (air temperature, precipitation, and radiation) can obtain high-precision random forest models of soil moisture and pH under fencing conditions, while combining the three climate variables and normalized difference vegetation index (NDVI) can obtain high-precision random forest models of soil moisture and pH under grazing conditions (Dai et al., 2023; Wang and Fu, 2023). In addition, these four input variables are relatively easy to obtain, and the model accuracy does not necessarily improve with more input variables (Dai et al., 2023; Wang and Fu, 2023). In this study, only these four independent variables were used to model soil carbon, nitrogen, and phosphorus.

Referring to previous studies (Tian and Fu, 2022; Dai et al., 2023), we modeled SOC, TN, TP, C:N, C:P, and N:P at 0–10, 10–20, and 20–30 cm under non-grazing scenarios using annual temperature (AT), annual precipitation (AP), and annual radiation (ARad) from the nine models. Although the default number of trees in the random forest model is 500, this may not be the optimal value; therefore, this default value was not adopted in this study. This may improve the accuracy of the model. The AT, AP, and ARad data were based on monthly air temperature, precipitation and radiation, respectively (Fu et al., 2022; Wang and Fu, 2023). Monthly climate data with high accuracies were spatial interpolation data from 145 weather stations, and their spatial resolution was 1 km × 1 km (Wang and Fu, 2023).

All the nine models were performed using R.4.1.2. The RFM, RRTM, MLRM, GBRM, and SVMM were based on the randomForest, rpart, stats, gbm, and e1071 packages, respectively (Han et al., 2022; Han et al., 2023a; Dai et al., 2023). The other four models were based on the rminer package (Han et al., 2022; Dai et al., 2023). For the RFM, except ntree and mtry, parameters of the randomForest function were set to default values. To obtain the optimal combination of the ntree and mtry, first we set the range of mtry to 1–3 or 1–4 under fencing or grazing conditions, and the ntree to 1,000, and ran 300 or 400 random forests to filter out the optimal ntree. Then we ran 300 or 400 random forest operations based on the mtry (1–3 or 1–4) and optimal ntree and selected the combination of ntree and mtry based on the largest R² value and the smallest mean square errors as the final random forest model. For the GBM, first the gbm and gbm.perf functions together were used to find the optimal n.trees. The cv.folds and n.trees of the gbm function were set as 2 and 10,000, respectively, and all the other parameters were set to be default values. The method of the gbm.perf function was set as cv, and all the other parameters were set to default values. Then, we used the gbm function to obtain the final gbm model based on the cv.folds of 2 and optimal n.trees. For the MLRM, RRTM, and SVM, all the parameters of the lm function, rpart function, and svm function were set to default values, respectively. For the ANNM, GLRM, CITM, and eXGBM, the search and scale parameters of fit function were set as heuristic5 and none, and the model parameter was set as mlp, cv.glmnet, ctree, and xgboost, respectively. In addition, all the other parameters were set to default values for the ANNM, GLRM, CITM, and eXGBM.

Similarly, we modeled SOC, TN, TP, C:N, C:P, and N:P at 0–10, 10–20, and 20–30 cm under free-grazing scenarios using AT, AP, ARad, and growing-season maximum normalized difference vegetation index (NDVI_max) from the nine models mentioned above (Tian and Fu, 2022; Dai et al., 2023). The NDVI_max data were maximum monthly NDVI for each year. Monthly NDVI data were obtained from the MOD13A3 NDVI (1 km × 1 km, 1 month) (Ma et al., 2022; Shen et al., 2022).

We combined the ggscatter function of the ggpubr package, the geom_smooth function of the ggplot2 package, the stat_poly_eq function of the ggpmisc package, and the ggarrange function of the ggpubr package to obtain the figures of the linear regression between observed and modeled SOC, TN, TP, C:N, C:P, and N:P, respectively. All statistical analyses were performed using R.4.1.2. In addition, the linear slope and R² value, relative bias, and root-mean-square error (RMSE) were used to test the accuracy of the various models adopted in this study (Fu et al., 2011; Tian and Fu, 2022).

3 Results

3.1 Training results

Climate trivariate (i.e., AT, AP, and ARad) explained 47.63%–93.20%, 11%–51%, and 35%–89% of the variations in potential soil variables based on the RFM, MLRM, and RRTM, respectively (Supplementary Tables S1–S3). Climate trivariate and NDVI_max together explained 62.00%–92.71%, 11%–63%, and 39%–69% of the variations in actual soil variables based on the RFM, MLRM, and RRTM, respectively (Supplementary Tables S1–S3). The ntree and mtry values of the RFM were 294–865 and 1–3 for potential soil variables, and 365–863 and 1–4 for actual soil variables, respectively (Supplementary Table S1). The trees values of the GBRM were 134–4,860 and 1,032–5,787 of the potential and actual soil variables, respectively (Supplementary Table S4). The support vector numbers were 113–198 and 89–207 for the SVMM of the potential and actual soil variables, respectively (Supplementary Table S5). In addition, error values varied among the ANNM, GLRM, CITM, and eXGBM, and the eXGBM had the greatest error values in most cases (Supplementary Table S6).

3.2 Testing results

The RFM and GBRM were more accurate than the other seven models for the observed SOC_{p_0–10} and SOC_{p_10–20}, whereas the RFM, GBRM, and SVMM were more accurate than the other six models for observed SOC_{p_20–30} (Figure 1; Tables 1, 2). The linear slopes and RMSEs between the observed SOC and modeled SOC from the RFM and GBRM at 0–10 cm and 10–20 cm were ≥0.91 and ≤3.43 g kg⁻¹, respectively, but those from the other seven models were ≤0.80 and ≥5.30 g kg⁻¹ (Figure 1; Table 2). The modeled SOC_{p_0–10} from the RFM and GBRM explained ≥98% (≥94%) of the variation in the observed SOC_{p_0–10} (SOC_{p_10–20}), but those from the other seven models only explained ≤95% (≤89%) of the variation in the observed SOC_{p_0–10} (SOC_{p_10–20}) (Figure 1). The absolute values of relative bias between the observed SOC_{p_0–10} and modeled SOC_{p_0–10} from the RFM, GBRM, SVMM, RRTM, and CITM were ≤6.05%, but those from the other four models were ≥6.53% (Table 1). The absolute values of relative bias between the observed SOC_{p_10–20} and modeled SOC_{p_10–20} from the RFM and GBRM were ≤1.70%, but those from the other seven models were ≥5.34% (Table 1). The RMSEs and absolute values of relative bias between the observed SOC_{p_20–30} and modeled SOC_{p_20–30} from the RFM, GBRM, and SVMM were ≤1.55 g kg⁻¹ and ≤2.31%, respectively, but those from the other six models were ≥2.36 g kg⁻¹ and ≥4.55% (Tables 1, 2). The linear slopes between the observed SOC_{p_20–30} and modeled SOC_{p_20–30} from the GLRM and eXGBM were ≤0.74, but those from the other seven models were ≥0.92 (Figure 1). The modeled SOC_{p_20–30} from the RFM, GBRM, SVMM, and eXGBM explained ≥95% of the variation in the observed SOC_{p_20–30}, but those from the other five models only explained ≤88% of the variation in the observed SOC_{p_20–30} (Figure 1).

FIGURE 1

FIGURE 1. Comparison of the modeled and observed potential SOC (g kg⁻¹) for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed potential SOC. RFM, random forest models; GBRM, generalized boosted regression; MLRM, multiple linear regression model; SVMM, support vector machine model; RRTM, recursive regression tree model; ANNM, artificial neural network; GLRM, generalized linear regression model; CITM, conditional inference tree; eXGBM, eXtreme gradient boosting.

TABLE 1

TABLE 1. The relative bias (unit, %) between modeled and observed soil carbon, nitrogen, and phosphorus and their ratios.

TABLE 2

TABLE 2. The RMSE (unit, g kg⁻¹ for SOC, TN, and TP) between modeled and observed soil carbon, nitrogen, and phosphorus and their ratios.

The RFM had the first highest accuracy among the nine models for the observed SOC_{a_0–10} and SOC_{a_10–20}, whereas the GBRM had the second highest accuracy among the nine models for observed SOC_{a_0–10} and SOC_{a_10–20} (Figure 2; Tables 1, 2). By contrast, the RFM and GBRM were the second and first most accurate among the nine models for observed SOC_{a_20–30}, respectively (Figure 2; Tables 1, 2). The linear slopes between the observed SOC_{a_0–10} (SOC_{a_10–20}) and modeled SOC_{a_0–10} (SOC_{a_10–20}) from the RFM and GBRM were 0.98 (≥0.91), but those from the other seven models were 1.10 or ≤0.84 (≤0.77) (Figure 2). The RMSEs between the observed SOC_{a_0–10} (SOC_{a_10–20}) and modeled SOC_{a_0–10} (SOC_{a_10–20}) from the RFM and GBRM were ≤7.01 g kg⁻¹ (≤3.90 g kg⁻¹), but those from the other seven models were ≥11.05 g kg⁻¹ (≥6.79 g kg⁻¹) (Table 2). The absolute values of relative bias between the observed SOC_{a_0–10} and modeled SOC_{a_0–10} from the RFM and GBRM were ≤1.07%, but those from the other seven models were ≥4.31% (Table 1). Modeled SOC_{a_0–10} from the RFM, GBRM, and eXGBM explained (R²=0.94) most of the variation in the observed SOC_{a_0–10}, but that from the other six models only explained ≤84% of the variation in the observed SOC_{a_0–10} (Figure 2). The absolute values of relative bias between the observed SOC_{a_10–20} and modeled SOC_{a_10–20} from the CITM, GBRM, and RFM were the first (2.02%), second (2.92%), and third (3.37%) minimums, respectively, but those from the other six models were ≥7.21% (Table 1). However, the modeled SOC_{a_10–20} from the CITM, GBRM, and RFM explained 81%, 94%, and 100% of the variation in the observed SOC_{a_10–20}, respectively (Figure 2). The linear slopes between the observed SOC_{a_20–30} and modeled SOC_{a_20–30} from the RFM, GBRM, MLRM, SVMM, and ANNM were ≥0.94, but those from the other four models were ≤0.84 (Figure 2). However, the RMSE and absolute values of relative bias between the observed SOC_{a_20–30} and modeled SOC_{a_20–30} from the RFM and GBRM were ≤4.07 g kg⁻¹ and ≤4.33%, respectively, but those from the other seven models were ≥5.40 g kg⁻¹ and ≥8.90%, respectively (Tables 1, 2). Modeled SOC_{a_20–30} from the RFM and GBRM explained 91% of the variation in the observed SOC_{a_20–30} but that from the other seven models only explained ≤88% of the variation (Figure 2).

FIGURE 2

FIGURE 2. Comparison of the modeled and observed actual SOC (g kg⁻¹) for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed actual SOC. The abbreviations are the same as in Figure 1.

The RFM and GBRM were the second and first most accurate of the nine models for observed TN_{p_0–10}, respectively (Figure 3; Tables 1, 2). By contrast, the RFM was the most accurate of the nine models for observed TN_{p_10–20} and TN_{p_20–30}, whereas the GBRM was the second most accurate for observed TN_{p_10–20} and TN_{p_20–30} (Figure 3; Tables 1, 2). Modeled TN_{p_0–10}, TN_{p_10–20}, and TN_{p_20–30} from the RFM and GBRM explained 98%, 93%, and 93% of the variation in the observed TN_{p_0–10}, TN_{p_10–20}, and TN_{p_20–30}, but those from the other seven models explained ≤96%, ≤92%, and ≤92% of the variation in the observed TN_{p_0–10}, TN_{p_10–20}, and TN_{p_20–30}, respectively (Figure 3). The linear slopes between the observed TN_{p_0–10} and TN_{p_10–20} and modeled TN_{p_0–10} and TN_{p_10–20} from the RFM and GBRM were ≥0.89, but those from the other seven models were ≤0.85 (Figure 3). The absolute values of relative bias between the observed TN_{p_0–10} (TN_{p_10–20}) and modeled TN_{p_0–10} (TN_{p_10–20}) from the RFM and GBRM were ≤6.17% (≤3.72%), but those from the other seven models were ≥9.68% (≥4.62%) (Table 1). The RMSEs between the observed TN_{p_0–10} and TN_{p_10–20} and modeled TN_{p_0–10} and TN_{p_10–20} from the RFM and GBRM, respectively, were ≤0.41 g kg⁻¹, but those from the other seven models were ≥0.55 g kg⁻¹ (Table 2). The absolute value of relative bias between the observed TN_{p_20–30} and modeled TN_{p_20–30} from the RFM, GBRM, MLRM, SVMM, and ANNM was ≤9.36%, but that from the other four models was ≥12.49% (Table 1). However, the RMSEs between the observed TN_{p_20–30} and modeled TN_{p_20–30} from the RFM, GBRM, and SVMM were ≤0.27 g kg⁻¹, and those from the other six models were ≥0.32 g kg⁻¹ (Table 2). Moreover, the linear slopes between the observed TN_{p_20–30} and modeled TN_{p_20–30} from the RFM, GBRM, and RRTM were 1.00, but those from the other six models were ≤0.98 (Figure 3).

FIGURE 3

FIGURE 3. Comparison of the modeled and observed potential TN (g kg⁻¹) for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed potential TN. The abbreviations are the same as in Figure 1.

The RFM and GBRM were more accurate than the other seven models for the observed TN_{a_0–10}, TN_{a_10–20}, and TN_{a_20–30} (Figure 4; Tables 1, 2). The linear slopes between the observed TN_{a_0–10} and modeled TN_{a_0–10} from the RFM, GBRM, and RRTM were ≥0.94, but those from the other six models were ≤0.92 (Figure 4). However, the modeled TN_{a_0–10} from the RFM, GBRM, and RRTM explained ≥91% of the variation in the observed TN_{a_0–10}, and those from the other six models explained ≤89% of the variation in the observed TN_{a_0–10} (Figure 4). The RMSEs between the observed TN_{a_0–10} and modeled TN_{a_0–10} from the RFM, GBRM, and RRTM were ≤0.56 g kg⁻¹, but those from the other six models were ≥0.68 g kg⁻¹ (Table 2). The absolute values of relative bias between the observed TN_{a_0–10} and modeled TN_{a_0–10} from the RFM, GBRM, and RRTM were 5.26%, 3.63%, and 13.99%, respectively (Table 1). The linear slopes between the observed TN_{a_10–20} and modeled TN_{a_10–20} from the RFM and GBRM were ≥0.91, but those from the other seven models were ≤0.86 (Figure 4). The modeled TN_{a_10–20} from the RFM and GBRM explained ≥92% of the variation in the observed TN_{a_10–20}, and those from the other seven models explained ≤86% of the variation in the observed TN_{a_10–20} (Figure 4). The RMSE value between the observed TN_{a_10–20}/TN_{a_20–30} and modeled TN_{a_10–20}/TN_{a_20–30} from the RFM and GBRM was ≤0.43 g kg⁻¹, but that from the other seven models was ≥0.49 g kg⁻¹ (Table 2). The absolute values of relative bias between the observed TN_{a_10–20} and modeled TN_{a_10–20} from the RFM and GBRM were ≤2.37% (Table 1). The absolute values of relative bias between the observed TN_{a_20–30} and modeled TN_{a_20–30} from the RFM, GBRM, and RRTM were 4.69%, 4.17%, and 0.75%, respectively, but those from the other six models was ≥4.72% (Table 1). The modeled TN_{a_20–30} from the RFM, GBRM, and eXGBM could explain 89%, 88%, and 88% of the variation in the observed TN_{a_20–30}, respectively, but those from the other six models explained ≤86% of the variation in the TN_{a_20–30} (Figure 4).

FIGURE 4

FIGURE 4. Comparison of the modeled and observed actual TN (g kg⁻¹) for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed actual TN. The abbreviations are the same as in Figure 1.

The RFM and GBRM were more accurate than the other seven models for observed TP_{p_0–10}, TP_{p_10–20}, and TP_{p_20–30} (Figure 5; Tables 1, 2). The RMSEs between the observed TP_{p_0–10}, TP_{p_10–20}, and TP_{p_20–30} and modeled TP_{p_0–10} (TP_{p_10–20}) from the RFM and GBRM were 0.04, 0.06, and 0.06 g kg⁻¹, respectively, but those from the other seven models were ≥0.06, 0.07, and 0.07 g kg⁻¹ (Table 2). The modeled TP_{p_0–10} (TP_{p_10–20}) from the RFM and GBRM explained 99% (≥97%) of the variation in the observed TP_{p_0–10} (TP_{p_10–20}), but that from the other seven models explained ≤98% (≤96%) of the variation in the observed TP_{p_0–10} (TP_{p_10–20}) (Figure 5). The modeled TP_{p_20–30} from the RFM, GBRM, and SVMM explained 97% of the variation in the observed TP_{p_20–30}, but that from the other six models explained ≤96% of the variation in the observed TP_{p_20–30} (Figure 5). The absolute value of relative bias between the observed TP_{p_0–10} and modeled TP_{p_0–10} from the RFM, GBRM, SVMM, and CITM was ≤0.75%, but that from the other five models was ≥1.00% (Table 1). The absolute values of relative bias between the observed TP_{p_10–20} and modeled TP_{p_10–20} from the RFM, GBRM, MLRM, SVMM, RRTM, ANNM, GLRM, CITM, and eXGBM were 3.68%, 3.20%, 0.57%, 3.53%, 0.71%, 0.57%, 1.67%, 0.10%, and 23.98%, respectively (Table 1). The absolute values of relative bias between the observed TP_{p_20–30} and modeled TP_{p_20–30} from the RFM and GBRM were 0.29% and 1.68%, respectively, but that from the other seven models was ≥6.16% (Table 1). The linear slopes between the observed TP_{p_20–30} and modeled TP_{p_20–30} from the RFM and GBRM were 0.99 and 1.00, respectively, but those from the other seven models were 1.20 or ≤0.91 (Figure 5).

FIGURE 5

FIGURE 5. Comparison of modeled and observed potential TP (g kg⁻¹) for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed potential TP. The abbreviations are the same as in Figure 1.

The RFM and GBRM were more accurate than the other seven models at predicting TP_{a_0–10}, TP_{a_10–20}, and TP_{a_20–30} (Figure 6; Tables 1, 2). The absolute values of relative bias between the observed TP_{a_0–10}, TP_{a_10–20}, or TP_{a_20–30} and modeled TP_{a_0–10}, TP_{a_10–20}, or TP_{a_20–30} from the eXGBM were much greater than those of the other eight models (Table 1). There was an obvious phenomenon in which several observed TP_{a_0–10}, TP_{a_10–20}, or TP_{a_20–30} values corresponded to one modeled TP_{a_0–10}, TP_{a_10–20}, or TP_{a_20–30} value from the CITM, respectively (Figure 6). There was an obvious phenomenon in which several observed TP_{a_10–20} or TP_{a_20–30} values corresponded to one modeled TP_{a_10–20} or TP_{a_20–30} value from the ANNM, respectively (Figure 6). There was an obvious phenomenon in which several observed TP_{a_10–20} values corresponded to one modeled TP_{a_10–20} value from the GLRM, respectively (Figure 6). The RMSEs between the observed TP_{a_0–10} or TP_{a_10–20} and modeled TP_{a_0–10} or TP_{a_10–20} from the RFM and GBRM were ≤0.07 g kg⁻¹, but those from the other seven models were ≥0.08 g kg⁻¹ (Table 2). The modeled TP_{a_0–10} from the RFM, GBRM, and RRTM explained ≥97% of the variation in the observed TP_{a_0–10}, but those from the other six models explained ≥95% of the variation in the observed TP_{a_0–10} (Figure 6). The linear slope and absolute value of relative bias between the observed TP_{a_0–10} and modeled TP_{a_0–10} from the RFM, GBRM, and RRTM were ≥0.98 and ≤2.23%, respectively (Figure 6; Table 1). The modeled TP_{a_10–20} from the RFM and GBRM explained 98% of the variation in the TP_{a_10–20}, but those from the other seven models only explained ≤93% of the variation in TP_{a_10–20} (Figure 2). The relative bias and linear slope between the observed TP_{a_10–20} and modeled TP_{a_10–20} from the RFM and GBRM were 2.31% and 0.99, and 1.03% and 0.97, respectively (Table 1). The linear slope between the observed TP_{a_20–30} and modeled TP_{a_20–30} from the RFM, GBRM, RRTM, and eXGBM was 1.00, but that from the other five models was ≤0.95 (Figure 6). The modeled TP_{a_20–30} from the RFM, GBRM, and RRTM explained ≥95% of the variation in the observed TP_{a_20–30}, but that from the other six models could only explain ≤92% of the variation in the observed TP_{a_20–30} (Figure 6). The RMSE between the observed TP_{a_20–30} and modeled TP_{a_20–30} from the RFM, GBRM, and RRTM was ≤0.10 g kg⁻¹, but that from the other six models was ≥0.13 g kg⁻¹ (Table 2). The absolute value of relative bias between the observed TP_{a_20–30} and modeled TP_{a_20–30} from the RFM, GBRM, and SVMM was ≤4.60%, but that from the other six models was ≥4.80% (Table 1).

FIGURE 6

FIGURE 6. Comparison of the modeled and observed actual TP (g kg⁻¹) for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed actual TP. The abbreviations are the same as in Figure 1.

The RFM and GBRM were more accurate than the other seven models at predicting the C:N_{p_0–10}, C:N_{p_10–20}, and C:N_{p_20–30} (Figure 7; Tables 1, 2). The absolute values of relative bias between the observed C:N_{p_0–10}, C:N_{p_10–20}, or C:N_{p_20–30} and modeled C:N_{p_0–10}, C:N_{p_10–20}, or C:N_{p_20–30} from the eXGBM were much greater than those from the other eight models (Table 1). There was an obvious phenomenon in which several observed C:N_{p_0–10}, C:N_{p_10–20}, or C:N_{p_20–30} values corresponded to one modeled C:N_{p_0–10}, C:N_{p_10–20}, or C:N_{p_20–30} value from the RRTM, ANNM, and GLRM, respectively (Figure 7). Additionally, there was an obvious phenomenon in which several observed C:N_{p_0–10} or C:N_{p_10–20} values corresponded to one modeled C:N_{p_0–10} or C:N_{p_10–20} value from the CITM, respectively (Figure 7). The RMSEs between the observed C:N and modeled C:N from the RFM and GBRM at 0–10, 10–20, and 20–30 cm were 1.00, ≤0.73, and ≤0.86, but those from the other seven models were ≥1.06, ≥0.83, and ≥1.11, respectively (Table 2). The relative biases between the observed C:N_{p_0–10} and modeled C:N_{p_0–10} from the RFM, GBRM, MLRM, and SVMM were 2.38%, 1.82%, 3.75%, and 3.43%, respectively (Table 1). The relative biases between the observed C:N_{p_10–20} and modeled C:N_{p_10–20} from the RFM, GBRM, MLRM and SVMM were −1.62%, −0.44%, 5.13%, and −1.26%, respectively (Table 1). The absolute values of relative bias between the observed C:N_{p_20–30} and modeled C:N_{p_20–30} from the RFM and GBRM were ≤1.63%, but those from the other seven models w ≥1.64% (Table 1). The linear slope between the observed C:N_{p_20–30} and modeled C:N_{p_20–30} from the RFM and GBRM was 0.97, but that from the other seven models was ≤0.94 (Figure 7). The modeled C:N_{p_20–30} from the RFM and GBRM explained ≥98% of the variation in the observed C:N_{p_20–30}, but that from the other seven models only explained ≤97% of the variation in C:N_{p_20–30} (Figure 7).

FIGURE 7

FIGURE 7. Comparison of the modeled and observed potential C:N for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed potential C:N. The abbreviations are the same as in Figure 1.

The RFM was more accurate than the other eight models in predicting the C:N_{a_10–20} (Figure 8; Tables 1, 2). The RFM and GBRM were more accurate than the other seven models at predicting the C:N_{a_0–10} and C:N_{a_20–30} (Figure 8; Tables 1, 2). The absolute values of relative bias between the observed C:N_{a_0–10}, C:N_{a_10–20}, or C:N_{a_20–30} and modeled C:N_{a_0–10}, C:N_{a_10–20}, or C:N_{a_20–30}, respectively, from the eXGBM were much greater than those from the other eight models (Table 1). There was an obvious phenomenon in which several observed C:N_{a_0–10}, C:N_{a_10–20}, or C:N_{a_20–30} values corresponded to one modeled C:N_{a_0–10}, C:N_{a_10–20}, or C:N_{a_20–30} value from the RRTM and CITM, respectively (Figure 8). Additionally, there was an obvious phenomenon in which several observed C:N_{a_10–20} values corresponded to one modeled C:N_{a_10–20} value from the ANNM (Figure 8). The absolute values of relative bias between the observed C:N_a and modeled C:N_a from the RFM and GBRM at 0–10, 10–20, and 20–30 cm were ≤3.18%, ≤2.74%, and ≤0.30%, but those from the other seven models were ≥5.39%, ≥3.73%, and ≥0.58%, respectively (Table 1). The RMSEs between the observed C:N_a and modeled C:N_a from the RFM and GBRM at 0–10, 10–20, and 20–30 cm were ≤1.25, ≤1.17, and ≤1.53, but those from the other seven models were ≥1.66, ≥1.36, and ≥1.60, respectively (Table 2).

FIGURE 8

FIGURE 8. Comparison of the modeled and observed actual C:N for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed actual C:N. The abbreviations are the same as in Figure 1.

The RFM and GBRM were more accurate than the other seven models in predicting the C:P_{p_0–10}, C:P_{p_10–20}, and C:P_{p_20–30} (Figure 9; Tables 1, 2). The absolute value of relative bias between the observed C:P_{p_0–10}, C:P_{p_10–20}, or C:P_{p_20–30} and modeled C:P_{p_0–10}, C:P_{p_10–20}, or C:P_{p_20–30}, respectively, from the eXGBM was much greater than those from the other eight models (Table 1). There was an obvious phenomenon in which several observed C:P_{p_0–10}, C:P_{p_10–20}, or C:P_{p_20–30} values corresponded to one modeled C:P_{p_0–10}, C:P_{p_10–20}, or C:P_{p_20–30} value from the GLRM and CITM, respectively (Figure 9). Both the linear slope and R² values between the observed C:P_{p_0–10} or C:P_{p_10–20} and modeled C:P_{p_0–10} or C:P_{p_10–20} from the MLRM and ANNM were lower than 0.90, respectively (Figure 9). The RMSEs between the observed C:P_{p_0–10} and modeled C:P_{p_0–10} from the RFM and GBRM were ≤8.88, but those from the other seven models were ≥9.91 (Table 2). The absolute values of relative bias between the observed C:P_{p_0–10} and modeled C:P_{p_0–10} from the RFM and GBRM were ≤0.42%, but those from the other seven models were ≥0.71% (Table 1). The RMSEs between the observed C:P_{p_10–20} and modeled C:P_{p_10–20} from the RFM, GBRM, and SVMM were ≤9.58, but those from the other six models were ≥12.03 (Table 2). The linear slopes and R² values between the observed C:P_{p_10–20} and modeled C:P_{p_10–20} from the RFM, GBRM, and SVMM were 1.00 and 0.94, 1.00 and 0.95, and 0.95 and 0.95, respectively (Figure 9). The RMSEs between the observed C:P_{p_20–30} and modeled C:P_{p_20–30} from the RFM, GBRM, SVMM, RRTM, and CITM were ≤6.08, but those from the other four models were ≥6.29 (Table 2). The linear slopes between the observed C:P_{p_20–30} and modeled C:P_{p_20–30} from the RFM, GBRM, SVMM, RRTM, and CITM were 0.98, 0.99, 0.95, 0.96, and 0.94, respectively (Figure 9).

FIGURE 9

FIGURE 9. Comparison of the modeled and observed potential C:P for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed potential C:P. The abbreviations are the same as in Figure 1.

The RFM was more accurate than the other eight models at predicting the C:P_{a_10–20} (Figure 10; Tables 1, 2). The RFM and GBRM were more accurate than the other seven models at predicting the C:P_{a_0–10} and C:P_{a_20–30} (Figure 10; Tables 1, 2). The absolute values of relative bias between the observed C:P_{a_0–10}, C:P_{a_10–20}, or C:P_{a_20–30} and modeled C:P_{a_0–10}, C:P_{a_10–20}, or C:P_{a_20–30}, respectively, from the eXGBM were much greater than those from the other eight models (Table 1). There was an obvious phenomenon in which several observed C:P_{a_0–10}, C:P_{a_10–20}, or C:P_{a_20–30} values corresponded to one modeled C:P_{a_0–10}, C:P_{a_10–20}, or C:P_{a_20–30} value from the RRTM, ANNM, and CITM, respectively (Figure 10). Linear slopes and R² values between the observed C:P_{a_0–10}, C:P_{a_10–20}, or C:P_{a_20–30} and modeled C:P_{a_0–10}, C:P_{a_10–20}, or C:P_{a_20–30} from the GLRM were lower than 0.80 (Figure 10). Linear slopes between the observed C:P_{a_0–10} or C:P_{a_20–30} and modeled C:P_{a_0–10} or C:P_{a_20–30} from the MLRM and SVMM were also lower than 0.80, but those from the RFM and GBRM were ≥0.93 (Figure 10). The RMSEs between the observed C:P_a and modeled C:P_a from the RFM and GBRM were ≤11.76, but those from the other seven models were ≥12.43 (Table 2).

FIGURE 10

FIGURE 10. Comparison of the modeled and observed actual C:P for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed actual C:P. The abbreviations are the same as in Figure 1.

The accuracy of the N:P_{p_0–10}, N:P_{p_10–20}, and N:P_{p_20–30} also changed with different models (Figure 11; Tables 1, 2). The absolute values of relative bias between the observed N:P_{p_0–10}, N:P_{p_10–20}, or N:P_{p_20–30} and modeled N:P_{p_0–10}, N:P_{p_10–20}, or N:P_{p_20–30}, respectively, from the eXGBM were much greater than those from the other eight models (Table 1). There was an obvious phenomenon in which several observed N:P_{p_0–10}, N:P_{p_10–20}, or N:P_{p_20–30} values corresponded to one modeled N:P_{p_0–10}, N:P_{p_10–20}, or N:P_{p_20–30} value from the GLRM and CITM, respectively (Figure 11). The RMSEs between the observed N:P_{p_0–10} and modeled N:P_{p_0–10} from the RFM and GBRM were ≤0.64, but those from the other seven models were ≥0.82 (Table 2). The RMSEs between the observed N:P and modeled N:P from the RFM, GBRM, SVMM, and RRTM at 10–20 cm and 20–30 cm were ≤0.95 and ≤0.77, but those from the other five models were ≥1.13 and ≥1.00, respectively (Table 2). However, the linear slopes between the observed N:P_{p_10–20} and modeled N:P_{p_10–20} from the RFM, GBRM, SVMM, and RRTM were 1.00, 1.00, 0.98, and 1.10, respectively (Figure 11).

FIGURE 11

FIGURE 11. Comparison of modeled and observed potential N:P for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed potential N:P. The abbreviations are the same as in Figure 1.

The RFM was more accurate than the other eight models in predicting N:P_{a_10–20} (Figure 12; Tables 1, 2). The RFM and GBRM were more accurate than the other seven models at predicting N:P_{a_0–10} and N:P_{a_20–30} (Figure 12; Tables 1, 2). The absolute values of relative bias between the observed N:P_{a_0–10}, N:P_{a_10–20}, or N:P_{a_20–30} and modeled N:P_{a_0–10}, N:P_{a_10–20}, or N:P_{a_20–30}, respectively, from the eXGBM were much greater than the other eight models, respectively (Table 1). The absolute values of relative bias between the observed N:P_{a_0–10} and modeled N:P_{a_0–10} from the MLRM, ANNM, GLRM, and CITM were lower than those from the RFM and GBRM (Table 1). However, there was an obvious phenomenon in which several observed N:P_{a_0–10} values corresponded to one modeled N:P_{a_0–10} from the CITM (Figure 12). The RMSEs between the observed N:P_a and modeled N:P_a from the RFM and GBRM at 0–10 cm, 10–20 cm, and 20–30 cm were ≤1.18, ≤0.75, and 0.57, but those from the other seven models were ≥1.33, ≥1.40, and ≥0.84, respectively (Table 2). The modeled N:P_a from the RFM and GBRM explained ≥94%, 97%, and 95% of the variation in the observed N:P_a at 0–10 cm, 10–20 cm, and 20–30 cm, respectively, but those from the other seven models only explained ≤93%, ≤94%, and ≤93% of the variation in the observed N:P_a (Figure 12). The slopes between the observed N:P_a and modeled N:P_a from the RF and GBRM at 0–10 cm and 10–20 cm were ≥0.95, but those from the other seven models were ≤0.93 (Figure 12). The slopes between the observed N:P_{a_20–30} and modeled N:P_{a_20–30} from the RF and GBRM were 1.00 and 0.99, respectively (Figure 12). The absolute values of relative bias between the observed N:P_a and modeled N:P_a from the RFM and GBRM at 10–20 cm and 20–30 cm were ≤4.30%, but those from the other seven models were ≥8.43% (Table 1).

FIGURE 12

FIGURE 12. Comparison of the modeled and observed actual N:P for (A–C) RFM, (D–F) GBRM, (G–I) MLRM, (J–L) SVMM, (M–O) RRTM, (P–R) ANNM, (S–U) GLRM, (V–X) CITM, and (Y–AA) eXGBM. The solid lines are the linear regression lines between the modeled and observed actual N:P. The abbreviations are the same as in Figure 1.

4 Discussion

Overall, the ARad, AP and AT based on the RFM explained the most variation in SOC, TN, TP, C:N, C:P, and N:P, those based on the RRTM explained the second most variation, and those based on the MLRM explained the least variation under the non-grazing scenarios (Supplementary Tables S1, S3, S5). The ARad, AP, AT, and NDVI_max based on the RFM explained the most variation in SOC, TN, TP, C:N, C:P, and N:P, those based on the RRTM explained the second most variation, and those based on the MLRM explained the least variation under the free-grazing scenarios (Supplementary Tables S1, S3, S5). That is, the RFM had a greater ability to explain the six soil variables than the RRTM and MLRM, whereas the RRTM had a greater ability to explain the six soil variables than the MLRM. This finding supported some previous studies that demonstrated that the RFM had a greater ability to explain soil moisture and pH, plant species α-diversity, forage nutrition quality, and storage (Han et al., 2022; Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023). Moreover, the prediction accuracy of RFM was also higher than that of RRTM and MLRM (Tables 1, 2). This finding also supported some previous studies that demonstrated that the RFM had a higher prediction accuracy of soil moisture and pH, plant species α-diversity, forage nutrition quality, and storage than the RRTM and MLRM in grasslands on the Tibetan Plateau (Han et al., 2022; Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023). Therefore, the performance of the RFM was better than the RRTM and MLRM, at least for the grassland areas on the Xizang Plateau.

The R² values under the non-grazing scenarios were not always lower than those under the free-grazing scenarios when we constructed the models of the six soil variables (Supplementary Tables S1, S3, S5). Moreover, the model accuracies under the non-grazing scenarios were not always lower than those under the free-grazing scenarios (Tables 1, 2). This finding implied that the increase in the number of independent variables cannot always improve the accuracy of the models. This finding was similar to several previous studies conducted on (Han et al., 2022; Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023) and outside the Qinghai-Xizang Plateau (Veloso et al., 2022). These previous studies found that the numbers of the input variables were not always positively related to the model accuracies of soil moisture and pH, plant species α-diversity, forage nutrition quality, and storage (Han et al., 2022; Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023). Moreover, a previous study found that the R² value (<0.86) between the observed and modeled SOC pool in a previous study (Liu et al., 2023) was lower than that between the observed and modeled SOC in this study, but the input variables numbers (>=12) of the previous study (Liu et al., 2023) were much greater than those in this study. Therefore, an increase in the input variables of the model does not always lead to a better model of plant and soil variables, at least for the grasslands of the Xizang Plateau. Moreover, instead of focusing on increasing the number of input variables and how to obtain more accurate input variable data, it is better to focus on how to find the optimal model to better help solve related ecological and environmental problems.

Similar to some previous studies (Dai et al., 2023; Wang and Fu, 2023), the predicted accuracies of the six soil variables based on the RFM were dependent on soil depth. This finding may be due to the following mechanisms. First, soil moisture and pH can regulate their responses of SOC, TN, TP, C:N, C:P and N:P to external disturbance (e.g., warming) (Yu et al., 2019a). The predicted accuracies of the six soil variables should be related to those of soil moisture and soil pH. In addition, soil moisture and pH can generally change with soil depth. Second, solar radiation and precipitation first reach the surface of soils and then gradually reach the deep layer of soils through energy conduction or infiltration. The normalized difference vegetation index is calculated by the spectral information reflected by land surface, and the energy source of air temperature is mainly the long-wave radiation of the land surface. All of these may lead to closer relationships between the six topsoil variables and the four input variables than deeper soil variables.

The RFM and GBRM achieved a greater accuracy for the six soil variables than the other seven models. However, the tree numbers of the GBRM were much greater than those of the RFM for most cases (Supplementary Tables S1, S2). The model complexity can be generally positively correlated with the tree numbers, whereas the calculation speed can be generally negatively correlated with the tree numbers (Han et al., 2022; Tian and Fu, 2022). Therefore, the RFM should be the best model among the nine models when all things are considered in this study, which was in line with the hypothesis. This phenomenon was similar to some previous studies that demonstrated that the RFM performed better than the other models at modeling soil moisture and pH, plant species α-diversity, forage nutrition quality, and storage in grassland ecological systems on (Han et al., 2022; Tian and Fu, 2022; Dai et al., 2023; Wang and Fu, 2023) and outside the Qinghai-Xizang Plateau (Fernandez-Delgado et al., 2014; Nussbaum et al., 2018). All these findings further support the advantage of the RFM in modeling soil and plant variables in grasslands on the Qinghai-Xizang Plateau.

The prediction accuracy of the six variables based on the RFM was greater than those in some previous studies (Yang et al., 2008; Yang et al., 2009; Yang et al., 2010; Shangguan et al., 2013; Kou et al., 2019; Wang et al., 2020; Wang et al., 2021a; Liu et al., 2023). For example, the modeled TN could only explain 34%–41% of the variation in the observed TN, and the RMSEs and relative bias were 1.15–1.20 g kg⁻¹ and 49.17% in China, respectively (Zhou et al., 2020). Moreover, compared with some previous studies (Kou et al., 2019; Wang et al., 2021a; Liu et al., 2023), although the number of independent variables used in the models constructed in this study was small, the accuracy was not reduced. This phenomenon may be due to the introduction of more independent variables that may introduce new uncertainties, considering that any independent variable can have its own data quality. Therefore, the RFM of SOC, TN, TP, C:N, C:P and N:P at the three depths in this study can have relative accuracies and can be used for related studies (e.g., the spatial distribution of these six variables) in grassland areas on the Xizang Plateau.

It is well known that climate change and human activities together affect these soil variables (Yu et al., 2019a; Zhang and Fu, 2021; Zha et al., 2022), but their relative contributions are unclear. Many previous studies only modeled actual soil organic carbon, total nitrogen, total phosphorus, and their ratios (Yang et al., 2009; Kou et al., 2019), so it was difficult to disentangle the relative effects of climate change and human activity on these soil variables. However, in this study, the potential and actual random forest models of these soil variables were constructed respectively, so it was relatively easy to separate the relative contributions of climate change and human activities to these soil variables.

There are some uncertainties or limitations in this study. First, as the NDVI comes from optical remote sensing, it is often affected by the conditions of satellite itself (such as instrument failure), the angle between the satellite remote sensing and the surface, weather conditions, and land surface conditions. The Xizang Plateau became dimming, which coincided with the increase in precipitation during 2000–2020 (Huang and Fu, 2023). During the rainy season on the Xizang Plateau, it is often rainy or cloudy for several consecutive days, which may prevent the collection of near infrared and red band data of plants and the calculation of the NDVI. On the Xizang Plateau, grassland productivity tends to decline from southeast to northwest. The NDVI is easy to saturate in areas with high vegetation productivity, while it is affected by soil interference in areas with low vegetation productivity (Ma et al., 2022; Shen et al., 2022). Second, many grassland areas on the Xizang Plateau have steep slopes, and the soil samples were mainly collected in areas with relatively shallow slopes. This may affect the relationships between these soil variables and climate data or NDVI data. Third, although climate and NDVI data are based on the corresponding year, soil variables are not entirely derived from observations in the same year, and climate change and human activities may alter these soil variables (Zhang et al., 2015; Fu and Shen, 2017). This phenomenon may also interfere with the model accuracy. Fourth, although the climate data used in this study can be highly accurate, there is some uncertainty associated with them. Fifth, only random forest models with soil variables at depths of 0–10, 10–20, and 20–30 cm were built in this study; therefore, future studies may consider building optimal models of soil variables at greater depths. Sixth, the data in this study were all obtained from the Xizang Plateau; therefore, it may be necessary to further test the accuracy of the models constructed in this study when extrapolating them to other regions of the Qinghai-Xizang Plateau. We can consider additional soil sampling in other areas of the plateau to build the optimal model of soil variables in the entire Qinghai-Xizang Plateau region in the future. Finally, only random forest models of the contents of these soil variables were constructed in this study, and the densities of soil organic carbon, total nitrogen, and total phosphorus could better reflect the changes in these soil variables. Therefore, the construction of optimal models of soil organic carbon density, total nitrogen density, and total phosphorus density should be considered in the future.

5 Conclusion

This study is the first to model SOC, TN, TP, C:N, C:P, and N:P at depths of 0–10, 10–20, and 20–30 cm under non-grazing and free-grazing scenarios based on nine methods. We can build a database of soil carbon, nitrogen, and phosphorus and their ratios over the past few decades or for the future across the whole grasslands of the Xizang Plateau under fencing and free-grazing conditions using the RFM established in this study. This database can be used to solve some of ecological, environmental, and soil management problems in grassland ecosystems in the Xizang Plateau. For example, the relative effects of climate change and human activities on soil organic carbon during the past decades can be distinguished and quantified, thus providing data support for improving soil carbon sequestration and mitigating climate warming.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

SW: Writing–original draft, Writing–review and editing. HQ: Investigation, Writing–original draft. TL: Investigation, Writing–original draft. YQ: Investigation, Writing–original draft. GF: Conceptualization, Validation, Visualization, Writing–original draft, Writing–review and editing. XP: Writing–original draft. XZ: Writing–original draft.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was funded by Henan Zhongmu County Research Project (E3C1050101), Remote Sensing Big data Analytics Project (E3E2051401), the Beijing Chaoyang District Collaborative Innovation Project (E2DZ050100), the Chinese Academy of Sciences Youth Innovation Promotion Association (2020054), National Natural Science Foundation of China (31600432), Tibet Autonomous Region Science and Technology Project (XZ202301YD0012C, XZ202202YD0009C, XZ202201ZY0003N, XZ202101ZD0007G, and XZ202101ZD0003N), and Construction of Zhongba County Fixed Observation and Experiment Station of First Support System for Agriculture Green Development.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer XS declared a shared affiliation with the authors SW, HQ, TL, YQ, and GF to the handling editor.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feart.2023.1340020/full#supplementary-material

References

Bahram, M., Espenberg, M., Pärn, J., Lehtovirta-Morley, L., Anslan, S., Kasak, K., et al. (2022). Structure and function of the soil microbiome underlying N₂O emissions from global wetlands. Nat. Commu. 13, 1430. doi:10.1038/s41467-022-29161-3

Can normalized difference vegetation index and climate data be used to estimate soil carbon, nitrogen, and phosphorus and their ratios in the Xizang grasslands?

1 Introduction

2 Materials and methods

3 Results

3.1 Training results

3.2 Testing results

4 Discussion

5 Conclusion

Data availability statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

References

95% of researchers rate our articles as excellent or good