Estimation of biomass in various components of Pinus koraiensis based on Bayesian methods

Liu, Hui; Dong, Xibin; Zhang, Ying; Qu, Hangfeng; Ren, Yunze; Zhang, Baoshan; Gao, Tong

doi:10.3389/ffgc.2024.1350888

ORIGINAL RESEARCH article

Front. For. Glob. Change, 17 June 2024

Sec. Forest Growth

Volume 7 - 2024 | https://doi.org/10.3389/ffgc.2024.1350888

Estimation of biomass in various components of Pinus koraiensis based on Bayesian methods

Hui Liu¹

Xibin Dong^1*

Ying Zhang¹

Hangfeng Qu^1,2

Yunze Ren¹

Baoshan Zhang¹

Tong Gao¹

¹College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
²Harbin Institute of Forestry Machinery, National Forestry and Grassland Administration, Harbin, Heilongjiang, China

Introduction: Pinus koraiensis is a dominant tree species in northeastern China. Estimating its biomass is required for forest carbon stock monitoring and accounting.

Methods: This study investigates biomass estimation methods for P. koraiensis components. A Bayesian approach was used to synthesize the parameter distributions of 298 biomass models as prior information to estimate the trunk, branch, leaf, and root biomass of P. koraiensis. The results were compared with non-informative prior and the minimum least squares (MLS).

Results: The results indicated that the Bayesian approach outperformed the other methods regarding model fit and prediction error. In addition, the responses of different components to tree height varied. The models of trunk and root biomass exhibited a smaller response to tree height, whereas those of branches and leaves showed a larger response to tree height. The model parameters yield precise estimations.

Discussion: In sum, this study highlights the potential of the Bayesian methods in estimating P. koraiensis biomass and proposes further enhancements to improve estimation accuracy.

1 Introduction

Estimating tree biomass is essential to comprehend the material cycling and energy flow in forest ecosystems (Zeng and Hausmann, 2022). During the 75th session of the United Nations General Assembly in September 2020, China announced its commitment to reducing peak carbon emissions by 2030 and achieving carbon neutrality before 2060. This acknowledgment underscores the significance of forest carbon sinks and the imperative to monitor, protect, and enhance terrestrial carbon stocks. It is grounded in the recognition that alterations in forest carbon stocks can impact atmospheric CO₂ concentrations (Kurz and Apps, 2006). Research on biomass estimation methods can enhance the accuracy of estimating forest carbon stocks and deepen our understanding and analysis of carbon cycling in forest ecosystems (Campbell et al., 2009; Stinson et al., 2011). However, due to the high cost and time consumption of collecting biomass data, it is necessary to estimate biomass accurately for carbon accounting and monitoring. Biomass models can be used with tree survey data (Wagers et al., 2023; Zanvo et al., 2023), such as diameter and height measurements.

Biomass estimation models have been established for Pinus koraiensis, a dominant tree species in Northeast China. Allometric growth models utilizing the diameter at breast height (DBH), tree height (H), and D²H have provided accurate biomass estimates (Wang, 2006; Dai et al., 2013). A major limitation of biomass models is that they cannot be universally applied to different species and locations due to the variability in allometric relationships (West et al., 1997; Enquist et al., 1998). The allometric relationships of trees are significantly influenced by environmental and competitive factors, which vary over time and across locations (Holbrook and Putz, 1989; King, 1991; Chambers et al., 2001). For instance, different light, soil fertility, and moisture conditions can affect tree growth patterns. Additionally, trees face competition from both conspecific and heterospecific plants, leading to variations in growth relationships (von Oheimb et al., 2011). Even within the same species, growth relationships can vary significantly across different locations, contradicting the universal scaling rules predicted by metabolic scaling theory for diverse species and biological communities (Li et al., 2005; Muller-Landau et al., 2006; Návar, 2009). Probability distributions can be used to overcome this limitation (Dong et al., 2014; Dogn et al., 2015; Widagdo et al., 2021; Xie et al., 2022b). Specifically, the probability distribution of scaling coefficients can assess the range of variation in these coefficients across different locations, providing prior information for Bayesian inference. Moreover, combining this information with field data can effectively capture the variability of growth parameters under different environmental conditions, thereby establishing a more general and accurate biomass estimation model. Building upon these findings, we propose a method using these parameter distributions to formulate new biomass models and apply it to P. koraiensis, a tree species indigenous to Northeast China.

Bayesian Statistics are employed to establish biomass models for different components of P. koraiensis. Non-informative and informative priors are used in the Bayesian framework. A total of 298 biomass models are synthesized for P. koraiensis components using data from the literature. The parameter distributions of the data are used as prior distributions to establish biomass models using Bayesian inference, an alternative approach in inferential statistics frequently used for assessing ecological models (Amir et al., 2022; Leach et al., 2022; Piccioni et al., 2022). Despite the discussion over Bayesian and classical Statistics in different scientific disciplines, research has revealed two notable advantages of Bayesian Statistics. First, Bayesian Statistics aims to learn from experience by incorporating prior knowledge about the data (Ghazoul and McAllister, 2003; Ellison, 2004). Second, Bayesian Statistics align entirely with mathematical logic, while classical Statistics demonstrate logicality solely in probabilistic statements regarding the long-term averages of repeatedly acquired sample data, rather than relying on hypotheses (Berger and Berry, 1988; Jaynes, 2003). Bayesian Statistics have been used in forestry to estimate tree diameter (Green et al., 1994; Deng et al., 2023), tree growth (Yue et al., 2022), tree mortality (Xie et al., 2022a), and other tree species biomass (Zhang et al., 2013; Aabeyir et al., 2020; Asrat et al., 2020), establish height-diameter models (Zhang et al., 2014), volume models (Yoon et al., 2013), and determine the spatial distribution of tree species (Engel et al., 2022). The goal of this study is to compile a dataset of allometric equations and parameters for P. koraiensis in the Northeast China region and derive the probability distribution of its parameters. In addition, we use data from the Lesser Khingan Mountains to evaluate the Bayesian framework for estimating the biomass of trunks, branches, leaves, and roots. Furthermore, we compare Bayesian methods with and without prior information to the least squares method for estimating the biomass of P. koraiensis.

2 Materials and methods

2.1 Study area

The study area was the Dongfanghong Forest Farm in the Lesser Khingan Mountains in Northeast China (Figure 1). The annual average temperature is 1.4°C, with the lowest average temperature occurring in early January (−40°C) and the highest temperature occurring in July (37°C). The area has a temperate continental humid monsoon climate, with an average annual precipitation of 661 mm, most of which falls in July and August. The predominant soil type is dark brown soil, with a few areas of valley meadow soil and marsh soil in forested areas. The approximate soil depth is 30 cm. The stand is a natural mixed secondary forest composed of coniferous and broad-leaved trees. There are six dominant tree species in this stand: P. koraiensis Sieb et Zucc, Picea koraiensis Nakai, Abies nephrolepis Maxim, Fraxinus mandshurica Rupr, Tilia amurensis Rupr, and Betula platyphylla Suk.

FIGURE 1

Figure 1. The geographical location of Pinus koraiensis. Areas labeled with red pentagrams belong to the Dongfanghong Forest, and the blue dots indicate where data from the literature was used.

2.2 Sampling design and biomass estimation

In the Dongfanghong Forest, a total area of 7 hm² were chosen in areas with similar site conditions, altitude, aspect, and slope. The stand density was 1,000 trees⋅hm^–2. Trees with a DBH greater than 5 cm were selected. The diameter range of the trees is listed in Supplementary Table 2. Systematic random sampling was conducted to harvest trees in this diameter range in 2022.

2.2.1 Biomass

In this study, a trunk is considered the primary supporting part of the branches from the ground to the top of the felled trees. The section with the largest circumference was considered part of the trunk to avoid ambiguity between branches and the trunk at the forks. The total fresh weight of trunks, branches, leaves, and roots was measured in the field. The trunk was cut into 1-m-long sections for measurement. Approximately 5 cm-thick discs were obtained from each section and weighed using spring scales. Samples of branches and leaves (approximately 50–100 g) were collected from average-sized branches and brought to the laboratory for moisture content analysis. The roots were excavated manually, weighed using a crane, and washed with a high-pressure water pump to remove all soil particles. The roots were classified and sampled based on diameter size into the following categories: stump (aboveground and underground parts), undifferentiated parts of coarse roots (diameter greater than 2 cm), and fine roots (diameter less than 2 cm). Fine roots (diameter less than 5 mm) were intentionally excluded from this study. One disc was obtained from the middle of the stump to calculate the dry weight of the belowground parts. The discs with different diameters were collected from coarse roots, and three full-length roots with diameters less than 2 cm were obtained to represent the biomass of small roots. The sum of the biomass of the root sections represented the total belowground dry biomass, and the fresh-to-dry weight ratios were obtained for each section. Thirty-one trees were sampled, with diameters at breast height (DBH) ranging from 5 to 35 cm, and heights ranging from 4 to 22.6 m. The samples (roots, trunks, branches, and leaves) were dried in the laboratory using a high-temperature oven at 105°C until a stable weight was reached. The dry biomass was calculated by multiplying the wet weight of the component by the dry/fresh weight ratio. The total dry weight of trunks, branches, leaves, and roots are summarized in Table 1.

TABLE 1

Table 1. Summary statistics of the destructively sampled trees in the Lesser Khingan Mountains.

2.3 Parameter value collection

We collected existing literature (journals, books, and reports) from 1978 to 2022 on biomass equations for P. koraiensis in Northeast China. We used keywords with logical operators (P. koraiensis, biomass, allometry, relationships, equations, models, and functions) to search the National Library of China (National Digital Library of China and China Forestry Digital Library), online literature databases (Web of Science, China National Knowledge Infrastructure, and China Science and Technology Journal Database), ecological data papers (Luo et al., 2020), as shown in “Supplementary Table 1.” We conducted an in-depth analysis of the literature to obtain reliable biomass equations and used the following criteria:

1. The search scope was only for equations applicable to forest-grown trees and open-grown trees.

2. The data for establishing biomass equations were based on at least three sample trees that were harvested and weighed to determine the tree biomass and its components (such as trunk, branches, leaves, and roots), although the number of tree components depended on the research objectives.

3. The biomass equations considered in the study were examined. Biomass equations meeting the above criteria were compiled and organized into the P. koraiensis Biomass Equation Dataset “Supplementary Table 1.” It consisted of a general table and an equation table. The former contained background information on the equations, including geographical location (e.g., latitude, longitude, and altitude), climate [mean annual temperature (MAT) and mean annual precipitation (MAP)], and stand description (e.g., forest type, dominant tree species, stand origin, stand age, and tree spacing). The latter included details, such as tree components for developing the biomass equations, predictor variables, equation form, coefficients, goodness-of-fit statistics (e.g., correlation coefficient and determination coefficient), and applicability range (i.e., methods and ranges of predictor variable values).

2.4 Methods

Bayesian methods are statistical frameworks that use prior information on parameter values to derive probabilities. By modeling observed data and unobserved variables, the Bayesian approaches provide a cohesive framework for combining data models and external knowledge.

2.4.1 Biomass model

We modeled the dry weight (W, kg) of different tree components (trunks, branches, leaves, and roots) as a function of height (H, m) and DBH (D, cm). Furthermore, we used logarithmic transformations to address heteroscedasticity (Refer to Equations 1, 2).

l n W = α + b l n D + e (1)

l n W = α + b l n (D^{2} H) + e (2)

Where α = lna and b represent the parameters of the model, and e denotes the error term, which follows a normal distribution with a mean of zero and a variance of σ². Classical methods to estimate the parameters include maximum likelihood estimation (MLE) and minimum least squares (MLS). The latter was used in this study to compare with the proposed Bayesian approach. The optimal intercept and slope are obtained by minimizing the sum of squared residuals between the observed and predicted values.

2.4.2 Bayesian rule

The Bayesian framework uses probability distributions to account for the uncertainty of the estimated parameters (Koricheva et al., 2013; LeBauer et al., 2013). Based on the observed data, θ has the following probability distribution (See Equation 3):

p (θ | y) = p (y | θ) p (θ) / p (y) (3)

We focus on the posterior probability distribution (abbreviated as posterior) of θ given the observed data y(p(θ|y)). The likelihood function p(y|θ) describes the distribution of y given the value of θ (Edwards, 1996). The prior probability distribution of the parameters, p(θ), is commonly referred to as the prior. It reflects the assumptions of the model. A distinguishing characteristic of Bayesian method is the treatment of parameters as random variables (Ellison, 2004; Li et al., 2012) in contrast to classical methods that assume the parameters to be true and fixed quantities (if unknown) (De Valpine and Hastings, 2002). In this study, the statistical model describes the relationship between the biomass of the different components (trunks, branches, leaves, and roots), denoted as W, and the variables D and H (See Equations 4, 5):

l n W \sim N_{1} (g_{1} (D : α, b) σ^{2}) (4)

l n W \sim N_{1} (g_{2} (D^{2} H : α, b), σ^{2}) (5)

2.4.3 Prior distribution

The choice of prior distributions is critical in Bayesian methods (De Valpine and Hastings, 2002). However, many researchers have chosen uninformative normal (Gaussian) priors with large or infinite variances disregarding any prior information that could potentially influence the parameters. Alternatively, if prior knowledge is available from external sources (e.g., parameters reported in the literature), this information can be utilized to construct informative prior distributions. In this study, we compared the predictions between models fit with an uninformative prior distribution versus an informative prior distribution. The prior Gaussian (normal) distributions of the uninformative priors for parameters α and b are αN(0, 1,000) and bN(0, 1,000).

For the models employing informative priors, we assume that α and b follow a bivariate normal distribution N(μ,Σ). The mean vector μ represents the central tendency of the data for each variable. It is typically calculated as the average of the observed values. For a bivariate normal distribution involving two variables α and b (See Equation 6):

μ = (\begin{matrix} μ_{α} \\ μ_{b} \end{matrix}), {\begin{matrix} μ_{α} = \frac{1}{n} \sum_{i = 1}^{n} α_{i} \\ μ_{b} = \frac{1}{n} \sum_{i = 1}^{n} b_{i} \end{matrix} (6)

Where μ_α and μ_b are the sample means of α and b, respectively.

The covariance matrix Σ captures the variance within each variable and the covariance between them. It is calculated based on the deviations of each variable from their respective means. For two variables α and b, the covariance matrix is Equation 7:

Σ = (\begin{matrix} σ_{α}^{2} & σ_{α b} \\ σ_{α b} & σ_{b}^{2} \end{matrix}), {\begin{matrix} σ_{α}^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} {(α_{i} - μ_{α})}^{2} \\ σ_{b}^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} {(b_{i} - μ_{b})}^{2} \\ σ_{α b} = \frac{1}{n - 1} \sum_{i = 1}^{n} (α_{i} - μ_{α}) (b_{i} - μ_{b}) \end{matrix} (7)

Where $σ_{α}^{2}$ and $σ_{b}^{2}$ are the variances of α and b, and σ_αb is the covariance between α and b. In this study, we chose not to employ a formal meta-analysis approach to construct the priors for our model. Many of the studies we synthesized did not report all necessary statistics comprehensively. This lack of complete data can lead to biases if not appropriately handled. While hierarchical meta-analysis models that incorporate missing data models can address these concerns (Koricheva et al., 2013; LeBauer et al., 2013), they require making conservative assumptions about the missing information. These assumptions, although helpful, may introduce uncertainty. Additionally, a hierarchical meta-analysis model is computationally intensive and requires substantial expertise and resources. Given our project’s scope and the availability of resources, we opted for a more direct approach using well-established empirical data from a foundational study (Gelman et al., 1995). A total of 298 biomass models were synthesized using data from the literature for the trunks, branches, leaves, and roots of P. koraiensis. The data are summarized in the document “Supplementary Table 1.” We performed correlation coefficient calculation, Shapiro–Wilk test, and bivariate normality test on the collected parameters using the cor, Shapiro test, and mvn functions in R, respectively.

Additionally, we assumed that the errors followed a normal distribution e ∼ N(0,σ2). Following Hadfield (2010), we used the scalar parameter for the prior of the errors, which follows an inverse Gamma distribution. The scalar and shape were equal to 0.0005.

2.4.4 Model fit and convergence assessment

The parameters in the linear Gaussian models were estimated using a Bayesian framework implemented in the R package MCMCglmm. Gibbs sampling (Chib and Greenberg, 1995) was employed to update the parameters iteratively. We performed 25,000 iterations for each model to ensure convergence and accurate estimation of the posterior distribution. The initial 5,000 iterations were discarded as burn-in to eliminate potential bias from the initial state of the chains. Additionally, we retained every second value in the posterior chains to reduce autocorrelation between consecutive iterations. For each model, we generated one MCMC chain and performed 25,000 iterations to ensure convergence and accurate estimation of the posterior distribution.

To evaluate convergence, we visualized the posterior samples using trace plots and employed Geweke (1991) convergence diagnostic on the model outputs. The results are shown in the Supplementary Information. All z-scores obtained from Geweke’s diagnostic were smaller than 1.96, indicating satisfactory convergence. Supplementary Table 2 and Supplementary Images 1–4 indicate the convergence of the MCMC chains.

2.4.5 Model evaluation

The evaluation metrics included the mean absolute deviation (MAD), mean deviation (MD), root mean square error (RMSE), model efficiency (MEF), and model bias ( $\bar{E}$ ). MEF represents the proportion of total variance explained by the model, considering the number of parameters and observations (Soares and Tomé, 2007; Hevia Cabal et al., 2013). A value of 1.0 indicates a perfect fit, while a value of 0.0 suggests that the model performs no better than average. Negative values indicate poor model performance (Soares and Tomé, 2007). Model bias measures the systemic deviation between the model predictions and observed data. Smaller values indicate better model performance for the remaining criteria (See Equations 8–12).

M D = \frac{\sum_{i 1}^{n} (y_{i} - {\hat{y}}_{i})}{n} (8)

M A D = \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} | / n (9)

R M S E = \pm \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n - 2}} (10)

M E F = 1 - \frac{(n - 1) \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{(n - k) \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} (11)

\bar{E} % = \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})}{\sum_{i = 1}^{n} {\hat{y}}_{i}}, 100 (12)

Where y_i represents the observed value of the biomass of the i_th tree, ${\hat{y}}_{i}$ represents the predicted value, $\bar{y}$ is the mean value of the observed values, n represents the number of observed values, and k is the number of parameters.

To compare posterior estimates derived from our proposed Bayesian models, we computed the absolute difference and absolute percentage difference in the means and standard deviations using the following Equations 13–16:

A D_{u} = | {\hat{u}}_{n} - {\hat{u}}_{i} | (13)

A P D_{u} = \frac{| {\hat{u}}_{n} - {\hat{u}}_{i} |}{{\hat{u}}_{n}} (14)

A D_{σ} = | {\hat{σ}}_{n} - {\hat{σ}}_{i} | (15)

A P D_{σ} = \frac{| {\hat{u}}_{σ} - {\hat{u}}_{i} |}{{\hat{u}}_{σ}} (16)

Where ${\hat{u}}_{n}$ and ${\hat{u}}_{i}$ are the posterior means for the models employing a non-informative and an informative prior, respectively, while ${\hat{σ}}_{n}$ and ${\hat{σ}}_{i}$ are the standard deviations of the posterior distributions.

3 Results

3.1 Prior distribution of the parameters

A total of 298 biomass models for each component were compiled from the literature. Multiple models were available for some sites, and most models were derived from areas in northeastern China (Figure 1). The data indicates that the average ranges of the collected model parameters α and b are as follows: −5.5274 to −2.7086 and 0.6987 to −2.3875, respectively (Table 2). Within the same model, the average values of parameter α for both the tree trunk and tree leaf exceed those of the tree branch and root. However, in terms of parameter b, the maximum value is associated with the root, while the minimum value is observed for the leaf. The estimates for parameters α and b in collected biomass models sharing the same predictor and response variables were normally distributed and negatively correlated (Figure 2 and Supplementary Table 2). Bivariate normality tests further confirmed that they followed a bivariate normal distribution (Supplementary Table 2). The posterior probability distributions of the information and non-information priors based on Bayesian methods were very similar (Table 3 and Figures 3, 4).

TABLE 2

Table 2. The prior distribution of parameters in the component biomass models that were derived from the dataset.

FIGURE 2

Figure 2. Distribution of parameters α (x-axis) and b (y-axis) for component models. The black dots represent the estimated values in literature, while the dashed lines represent the prior bivariate normal distributions we used for inference in the models.

TABLE 3

Table 3. The absolute difference and absolute percentage difference (%) in the means and standard deviations for models with different priors.

FIGURE 3

Figure 3. Posterior probability density of two parameters for the component biomass models lnW = α + blnD + e. The black and blue solid lines respectively represent the outputs for the model employing a non-informative and informative prior, while the orange dashed line corresponds to the informative prior.

FIGURE 4

Figure 4. Posterior probability density of two parameters for the component biomass models lnW = α + bln(D²H) + e. The black and blue solid lines respectively represent the outputs for the model employing a non-informative and informative prior, while the orange dashed line corresponds to the informative prior.

3.2 Model parameters

A comparison of the parameters from the same model using the same method showed that the α value of the trunk was the smallest, but the b value was the largest. However, the α and b values of the roots were higher than those of the branches and leaves. For models of the same component, the values of parameters α and b were similar and had similar ranges for the non-information-rich prior and MLS methods. The ranges of parameters α and b were larger for these two methods than for the informative prior method (see Table 4). The analysis of variance revealed significant differences between the parameters and zero for each model at the 95% confidence level. The estimated allometric coefficient was higher for M1 than for M2 for the same component model and the same method. Furthermore, comparing the allometric exponents of the same method, it was found that in each component model, the estimated allometric exponent of M1 was higher than that of M2. This result suggests that in the biomass models of each component, the allometric exponent of the model without tree height as a predictor variable (M1) is higher than that of the model with tree height as a predictor variable (M2).

TABLE 4

Table 4. Estimates and 95% confidence interval (CI) of model parameters.

3.3 Model evaluation

The model evaluation metrics are listed in Figure 5. Bayesian methods with only prior information (M1 and M2) yielded trunk biomass higher than the predicted values. The differences between the predicted and observed values of trunk biomass were larger than the differences between other biomass components for all methods. However, the deviations were within the 95% interval suggested by Huang and Wang (2003). The MEF was used to assess the degree of model fit. The variance explained by the trunk, branch, and leaf biomass models ranged from 71.40% to 96.75%. The values of all evaluation metrics were similar for the non-information-rich prior and MLS methods. Bayesian methods with an information-rich prior had the largest MEF and the smallest RMSE and MAD for the trunk, branch, and leaf biomass models without tree height. The same results were obtained from the trunk, branch, and leaf biomass models containing tree heights. The Bayesian approach with an informative prior and the tree height predictor had the largest MEF among all trunk biomass models. Conversely, Bayesian methods with an informative prior but without tree height predictors obtained the maximum MEF values among all branch and leaf biomass models, respectively. The MEFs for the tree root biomass models were 4.36%–11.26% higher for models with than without tree height predictors. The MADs and RMSEs were also smaller than those of the tree root biomass models with tree height predictors.

FIGURE 5

Figure 5. Evaluation indicators for three approaches and the two models for the tree components. The x-axis label uses the acronym for the components, equations and methods. For example, T1I represents the information prior method for the trunk and the M1 equation (lnW = α + blnD + e).

4 Discussion

Parameter estimation is an important error source in biomass models and determines model applicability. The predicted trunk biomass was larger than the actual values when the DBH was less than 13 cm or greater than 25 cm (Figures 6, 7). The predicted branch biomass was lower than the actual values when the DBH was less than 10 cm or greater than 30 cm. Leaf biomass was underestimated when the DBH was greater than 10 cm and less than 25 cm. Similarly, root biomass predictions were underestimated when the DBH was greater than 10 cm and less than 16 cm. These trends indicate an overestimation of biomass for trunk, leaves, and roots at smaller and larger diameters and the opposite trend for branches. However, if the Bayesian prior does not contain information, the Bayesian confidence interval is usually numerically consistent with the classical confidence interval (McCarthy, 2007; Zhang et al., 2013), which was confirmed in this study (see Table 3). A non-informative prior indicates that the data arecrucial in the Bayesian theorem, and the prior probabilities of all plausible parameter values are similar. As a result, the posterior distribution has a similar form to the likelihood function. However, using a noninformative prior leads to a less precise posterior distribution, wider confidence intervals, and worse predictive performance (see Table 3). In this study, allometric growth models for P. koraiensis were established using data from the published literature. It was found that the bivariate normal distribution accurately described the parameter distributions of the allometric growth model. The bivariate normal distribution is typically the prior distribution for estimating tree biomass using a Bayesian model. One of the advantages of Bayesian methods is its capability to incorporate prior information when updating the model. Thus, the samples and the parameters being estimated are considered random variables. Consequently, Bayesian methods generally outperform MLS (see Figure 5).

FIGURE 6

Figure 6. Plot of the data and predictions for each component biomass model ln⁡W = α + bln⁡D + e. Black dots represent the observed data points. Blue, red, and orange shaded areas denote the 95% credible or confidence intervals of the expected biomass using the three different parameter estimation approaches, while the lines correspond to the (posterior) predictive means. Note that both x and y axes are on a logarithmic scale. Note that both x and y axes are on a logarithmic scale, and that the blue and orange lines and shades mostly overlap with the red ones.

FIGURE 7

Figure 7. Plot of the data and predictions for each component biomass model ln⁡W = α + bln⁡(D²H) + e. Black dots represent the observed data points. Blue, red, and orange shaded areas denote the 95% credible or confidence intervals of the expected biomass using the three different parameter estimation approaches, while the lines correspond to the (posterior) predictive means. Note that both x and y axes are on a logarithmic scale. Note that both x and y axes are on a logarithmic scale, and that the blue and orange lines and shades mostly overlap with the red ones.

Different model types affect the efficiency, bias, and other numerical values of models. Various allometric biomass models have been employed to estimate forest biomass (Chen, 1981; Wang, 2006; Ma and Li, 2008; Dai et al., 2013; Dong et al., 2014; Widagdo et al., 2021; Xie et al., 2022b), particularly the models W ∼ aD^b and W ∼ a(D²H)^b. For example, the MEF of a model (M1) with only the DBH as an explanatory variable explained 96.8% and 95.0% of the variation of branch and leaf biomass, respectively. Therefore, the DBH is widely used to estimate biomass (Baker et al., 2004; Chave et al., 2005; Henry et al., 2011). This variation can be attributed to the influence of ecological conditions and the tree age, which affect biomass (Picard et al., 2012). Therefore, the DBH is a critical parameter in allometric growth models of trees and is a primary indicator of above-ground biomass. The addition of tree height to the model slightly increased the MEF of trunk biomass from 79.8% to 83.8%. In contrast, Feldpausch et al. (2012) found that tree height was a significant parameter in estimating tree biomass. The slight increase in the MEF may be partly attributed to difficulties in accurately measuring tree height using Haga hypsometers in the field, especially when the treetop is obscured by other crowns.

The results of different methods for the same model showed that the method affected the estimation of model parameters and MEF. Previous studies have estimated P. koraiensis biomass using two models and the classical method (Wang, 2006; Xu et al., 2022). Although these two models yielded satisfactory performance with high R² values, their accuracy in estimating forest biomass beyond the specified data range and site conditions is limited (Case and Hall, 2008; Sileshi, 2014). Incorporating parameter factors from different geographical locations into tree biomass models increases variability, suggesting that probability distributions are better suited for parameterizing allometric growth models than the fixed values typically used in MLS (Figure 3). Hence, the widespread application of P. koraiensis biomass models at the stand level may overlook significant variations among different stands. This study proposed a Bayesian approach for modeling the biomass of P. koraiensis components. Zapata-Cuartas et al. (2012) found that the Bayesian and MLS methods provided almost identical RMSE values for estimating parameters using large sample sizes. However, Bayesian methods had a smaller RMSE for small sample sizes, indicating that it has higher efficiency in parameter estimation. In this study, the sample size for the trunk, branch, and leaf biomass models was 31, and Bayesian methods outperformed the MLS in terms of MEF, RMSE, and MAD.

5 Conclusion

This study utilized the Bayesian approach to develop and compare two commonly used models for estimating the biomass of P. koraiensis components. Information priors from the published literature were used to establish 298 P. koraiensis biomass models. The prior distribution was used in the Bayesian model to estimate tree biomass. The Bayesian approach outperformed the MLS, offering a more reasonable and effective approach for estimating the biomass of P. koraiensis components. Several metrics (MEF, $\bar{E}$ , MD, MAD, and RMSE) indicated differences in the biomass models for different components when the tree height was included or excluded. The DBH and the tree height were the main predictor variables significantly affecting the variation in trunk and root biomass, whereas only the DBH affected the variation in branch and leaf biomass. The model parameters provided accurate estimation results. However, Bayesian methods have room for improvement. Additional variables can be incorporated, and a hierarchical Bayesian model can be established to provide more accurate prior information. There may exist potential confounders affecting both the observed biomass and the predictors (i.e., D and H). Incorporating these variables into the regression models would facilitate better estimation of the predictors’ effects. In addition, considering a hierarchical Bayesian model would help pool information from trees of different subspecies or from different geological areas, while enabling a better characterization of the differences among them (Dietze et al., 2008). By incorporating hyper-parameters, the hierarchical model also prevents the model estimates from being overly affected by the prior information provided. This is particularly important due to the possible inconsistency between the biomass data utilized to derive the priors and those collected in our study, reducing the validity of direct extrapolation (Vieilledent et al., 2010).

Data availability statement

The original contributions presented in this study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

HL: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Writing – original draft. XD: Conceptualization, Resources, Validation, Writing – review & editing. YZ: Data curation, Formal analysis, Investigation, Writing – review & editing. HQ: Investigation, Project administration, Writing – review & editing. YR: Investigation, Writing – original draft. BZ: Data curation, Writing – review & editing. TG: Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was funded by the Key Technologies Research and Development Program (CN) (2022YFD2201001) and the Heilongjiang Province Applied Technology Research and Development Program Project (GA19C006).

Acknowledgments

The authors thank Dongfanghong Forest Farm for their help in the data collection.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/ffgc.2024.1350888/full#supplementary-material

References

Aabeyir, R., Adu-Bredu, S., Agyare, W. A., and Weir, M. J. C. (2020). Allometric models for estimating aboveground biomass in the tropical woodlands of Ghana, West Africa. For. Ecosyst. 7, 1–23. doi: 10.1186/s40663-020-00250-3