Understanding factors influencing the estimated genetic variance and the distribution of breeding values

Nilforooshan, Mohammad Ali; Ruíz-Flores, Agustín

doi:10.3389/fgene.2022.1000228

ORIGINAL RESEARCH article

Front. Genet., 13 October 2022

Sec. Evolutionary and Population Genetics

Volume 13 - 2022 | https://doi.org/10.3389/fgene.2022.1000228

Understanding factors influencing the estimated genetic variance and the distribution of breeding values

Mohammad Ali Nilforooshan¹*

Agustín Ruíz-Flores²

¹Livestock Improvement Corporation, Hamilton, New Zealand
²Departamento de Zootecnia, Universidad Autónoma Chapingo, Texcoco, Mexico

This study investigated the main factors influencing the genetic variance and the variance of breeding values (EBV). The first is the variance of genetic values in the base population, and the latter is the variance of genetic values in the population under evaluation. These variances are important as improper variances can lead to systematic bias. The inverse of the genetic relationship matrix (K⁻¹) and the phenotypic variance are the main factors influencing the genetic variance and heritability (h²). These factors and h² are also the main factors influencing the variance of EBVs. Pedigree- and genomic-based relationship matrices (A and G as K) and phenotypes on 599 wheat lines were used. Also, data were simulated, and a hybrid (genomic-pedigree) relationship matrix (H as K) and phenotypes were used. First, matrix K underwent a transformation (K* = wK + α11′ + βI), and the responses in the mean and variation of diag(K⁻¹) and offdiag(K⁻¹) elements, and genetic variance in the form of h² were recorded. Then, the original K was inverted, and matrix K⁻¹ underwent the same transformations as K, and the responses in the h² estimate and the variance of EBVs in the forms of correlation and regression coefficients with the EBVs estimated based on the original K⁻¹ were recorded. In response to weighting K by w, the estimated genetic variance changed by 1/w. We found that μ(diag(K)) − μ(offdiag(K)) influences the genetic variance. As such, α did not change the genetic variance, and increasing β increased the estimated genetic variance. Weighting K⁻¹ by w was equivalent to weighting K by 1/w. Using the weighted K⁻¹ together with its corresponding h², EBVs remained unchanged, which shows the importance of using variance components that are compatible with the K⁻¹. Increasing βI added to K⁻¹ increased the estimated genetic variance, and the effect of α11′ was minor. We found that larger variation of diag(K⁻¹) and higher concentration of offdiag(K⁻¹) around the mean (0) are responsible for lower h² estimate and variance of EBVs.

1 Introduction

Best linear unbiased prediction (BLUP, Henderson (1975a)) segregates genetic and environmental effects influencing phenotypes, for predicting breeding values. The statistical model, including fixed effects, non-genetic random effects, and random genetic effects, defines how this segregation should be done. Depending on the BLUP type, additive genetic relationships between individuals are modelled via A, G, or H matrices (A for the pedigree-based BLUP (PBLUP), G for genomic BLUP (GBLUP, VanRaden (2008)), and H for single-step genomic BLUP (ssGBLUP, Aguilar et al., 2010; Christensen and Lund, 2010), where H is a pedigree-genomic hybrid additive relationship matrix). The inverses of these matrices are used in BLUP. Denoting A, G, and H as K, BLUP (in its simplest form) is written as:

[\begin{matrix} X^{'} X & X^{'} Z \\ Z^{'} X & Z^{'} Z + K^{- 1} λ \end{matrix}] [\begin{matrix} \hat{b} \\ \hat{u} \end{matrix}] = [\begin{matrix} X^{'} y \\ Z^{'} y \end{matrix}], (1)

where X and Z are matrices relating phenotypes to fixed effects and individuals, respectively, y, $\hat{b}$ and $\hat{u}$ are the vectors of phenotypes, estimated fixed effects and estimated breeding values (EBV), λ is the residual to genetic variance ratio, equal to (1—h²)/h², and h² is the heritability of the trait. The greater the genetic variance, the lower the λ, and the wider the distribution of EBVs deviated from the mean. Matrix K and the phenotypic variance (within and across families) are other factors influencing the genetic variance and the variance of EBVs. The more related the individuals in a population, the less the genetic variation in that population. Genetic variance is a function of μ(diag(K)) − μ(offdiag(K)) = D_k (Searle, 1982; Speed and Balding, 2015; Legarra, 2016), and μ(K) is heavily ((1–1/n)%, where n is K’s dimension) influenced by μ(offdiag(K)). According to Legarra (2016): “the genetic variance is the variance of the genetic values of a set of individuals who constitute the reference, or base, population”, and it is a function of D_k. The higher the D_k value, the higher the genetic variance.

For the same population, trait, and model, different K may result in different estimates of genetic variance (Legarra, 2016), which can be practically confusing. Also, due to different genetic variances imposed by different K, genomic EBVs from GBLUP or ssGBLUP may show a different variance and distribution compared with those from PBLUP, which can be interpreted as bias in the validation of genomic evaluations (Nilforooshan et al., 2010). Tsuruta et al. (2019) studied bias in genomic EBV and reported the incompatibility between A and G, ignoring inbreeding coefficients in A⁻¹, strong selection on a trait (especially when the incompatibility between A and G is large), inaccurate estimates of unknown parent groups, and using outdated or improper genetic parameters as the main sources of bias in ssGBLUP evaluations. It is interesting to know which properties of K elements cause different estimates of genetic variance and distributions of EBVs for different K.

Whereas, G is built and inverted for GBLUP, A⁻¹ and H⁻¹ are directly obtained without forming A and H. Matrix G⁻¹ is needed as a part of H⁻¹ (Aguilar et al., 2010). In fact, forming and inverting A is computationally very intensive (Nilforooshan et al., 2021), even more intensive for H. Therefore, knowing about the relationships between the K⁻¹ properties and the genetic variance can be as important as the relationships between the K properties and the genetic variance. There is little known about the effect of K⁻¹ properties on the estimated genetic variance and the distribution of EBVs.

The aim of this study is to find distribution properties of K and K⁻¹ influencing the estimated h² and the distribution of EBVs. Matrix K influences the estimated h² and the distribution of EBVs via K⁻¹. It defines the way the phenotypic variation is distributed within and across relatives (e.g., full-sib and half-sib families), and the covariance made between phenotypes and genetic effects (equal to the genetic variance). However, the phenotypic variance, individuals on which phenotypes are recorded (via Z), and other blocks of the mixed model equations, such as those corresponding to additional random effects (if any) also play a role in the proportion of total variance allocated to additive genetic effects, and the distribution of EBVs.

2 Materials and methods

2.1 Data

2.1.1 Wheat data

The built-in data from R package BGLR (de los Campos and Perez Rodriguez, 2021) was used in this study. The data consist of a pedigree-based additive genetic relationship matrix, genotypes on 1,279 Diversity Array Technology markers, and phenotypes in four environments on 599 CIMMYT wheat lines. Twenty markers were deleted due to having a minor allele frequency less than 0.02. Genotypes and phenotypes (2-year average grain yield) were available for all the lines. Phenotypes in environment one were considered as a single trait to study. Phenotypes were available on all lines and had a μ ± sd of 0 ± 1.

2.1.2 Simulated data

Data were simulated to test the hypotheses on more than the wheat data. The R package pedSimulate (Nilforooshan, 2022c) was used for data simulation. The simulation began with a founder population (F0) of 100 males and 100 females randomly mated to each other to produce the next generation. Ten generations were simulated following F0, with no generation overlap. There were 3,144 individuals in the simulated pedigree. The mating ratio was 1:1, and the litter size was 4. Females were selected based on own phenotypes, and in each generation 50% of females were randomly mated to 50% of males. Genotypes were simulated on 5,000 markers. Phenotypes were set to missing for F0, and for 25% of males onwards. Genotypes were retained for generations 8 to 10 (700 individuals), and the rest of genotypes were set to missing. For generations 1 to 10, 10% of sires and 5% of dams were randomly set to missing. Phenotypes were available on 2,554 individuals and had a μ ± sd of 48.08 ± 6.48.

2.2 Methods

The statistical model was y = μ + Zu + e. There was no fixed effect other than the overall mean. Matrix G was formed using method 1 of VanRaden (2008): G = WW′/2∑p_l(1 − p_l), and then combined with A as 0.95G + 0.05A, where p_l is the marker allele frequency at locus l, and W is the centered and scaled genotype matrix. The wheat data underwent PBLUP and GBLUP, and the simulated data underwent ssGBLUP (i.e., using A⁻¹, G⁻¹ and H⁻¹ in BLUP (Eq. 1), respectively).

Variance components (genetic and residual variances) were estimated using ASReml statistical software (Gilmour et al., 2015). Breeding values were predicted using the R function solver in the data repository ([Dataset] Nilforooshan, 2022b) to form and solve the mixed model equations. The equation systems were small (600 and 3,145 equations for the wheat and the simulated data, respectively) and the mixed model equations were solved directly.

2.2.1 Properties of K

To study the relationships between the properties of K and the estimated h² and the distribution of EBVs, matrix K underwent the following transformation.

K^{*} = w K + α 11^{'} + β I (2)

Different combinations of w (0.9, 1, and 1.1), α (–0.05, 0, and 0.05), and β (–0.05, 0, and 0.05) were tested on K, and the influences on the distributions of diag(K⁻¹), offdiag(K⁻¹), and the estimated h² were studied. For PBLUP (K = A), α = –0.05 was not tested, as it was leading to negative offdiag(A) values. Similarly, α = 0 and β = –0.05 was not tested on A, as it was leading to diagonal values less than 1. One of the main characteristics of A as opposed to G (and H) is having a 1-tailed distribution with the minimum diagonal and off-diagonal values of 1 and 0, respectively.

Coefficients α and β re-base K and diag(K), respectively. Coefficient w weights K, which also changes μ(diag(K)) − μ(offdiag(K)), the variation in K elements, and the variation in diag(K) relative to the variation in offdiag(K).

2.2.2 Properties of K⁻¹

The original K (w = 1, α = 0, and β = 0) was inverted, and then K⁻¹ was transformed using Eq. 2 (replacing K with K⁻¹ in the equation). Different combinations of w, α, and β were tested on K⁻¹ (same as those tested on K), and the influences on the h² estimate, and the distribution of EBVs were studied. Distribution of EBVs were compared to those using the original K⁻¹ (w = 1, α = 0, and β = 0) by regressing the latter to the first, and estimating the Pearson correlation between the two sets of EBV. EBVs were predicted using the h² estimate applying the original K⁻¹ vs. ${(K^{- 1})}^{*}$ .

Similar and other transformations are applied to K and K⁻¹ in the context of ssGBLUP (e.g., Aguilar et al. (2010); Chen et al. (2011); Vitezica et al. (2011); Misztal et al. (2015); MiX99 Development Team (2016); Martini et al. (2018)). However, we emphasize that the transformation in this study (Eq. 2) was for studying the properties of K and K⁻¹, not fine-tuning them.

3 Results and discussion

For the wheat data, K corresponds to A and G, and for the simulated data, K corresponds to H.

3.1 Properties of K

In animal populations, matrices A and G have 1-tailed and 2-tailed distributions skewed toward right, with diagonal and off-diagonal elements centered around 1 and 0, respectively (Simeone et al., 2011; Nilforooshan, 2020). The wheat data showed unusual distributions of A and G compared to animal data. Also, genotypes for all the markers were 0s and 1s. Distributions of the diagonal and off-diagonal elements of A and G for the wheat data are presented in Supplementary Figure S1, and the distributions of the diagonal and off-diagonal elements of A, G and H for the simulated data are presented in Supplementary Figure S2. Diagonal elements of K had a larger mean and a smaller variation than the off-diagonal elements. Thus, w $>$ 1 increased the mean of diag(K) further than the mean of offdiag(K), and increased the variance of offdiag(K) further than the variance of diag(K).

Table 1 shows the effect of re-basing and re-scaling K via α and w, and re-basing diag(K) via β, on the distribution (μ ± sd) of diag(K⁻¹). The corresponding results for offdiag(K⁻¹) are presented in Table 2. Increasing w from 0.9 to 1.1, and β from –0.05 to 0.05 reduced both the mean and sd of diag(K⁻¹). The reductions by the increase of β were more noticeable than the reductions by the increase of w. The effect of α was marginal. For the wheat data, reduction of β to –0.05 had severe consequences on the properties of K⁻¹ as in some cases μ(diag(K⁻¹)) became negative or largely positive, and sd(diag(K⁻¹)) and sd(offdiag(K⁻¹)) increased tremendously. This numerical instability can be due to K getting close to singular by the reduction of μ(diag(K)) − μ(offdiag(K)). The consequences were less severe for w = 1.1 and more severe for w = 0.9. The reason is again due to the change of μ(diag(K)) − μ(offdiag(K)) because of w. Like A, an A⁻¹ in a good numerical condition is expected to have a minimum of diagonal values equal to 1 (for an individual with no known parent and progeny). Matrix A, which compared to G had 3.11 times μ(diag(K)) − μ(offdiag(K)), 0.611 times sd(diag(K)), and 2.893 times sd(offdiag(K)), showed μ(diag(K⁻¹)) and μ(offdiag(K⁻¹)) close to those for G, 1.750 times sd(diag(K⁻¹)), and 1.887 times sd(offdiag(K⁻¹)).

TABLE 1

TABLE 1. The effect of w, α, and β (wK + α11′ + βI) on the distribution of diag(K⁻¹).

TABLE 2

TABLE 2. The effect of w, α, and β (wK + α11′ + βI) on the distribution of offdiag(K⁻¹).

The effects of w, α, and β on μ(offdiag(K⁻¹)), sd(offdiag(K⁻¹)), and μ(diag(K⁻¹)) − μ(offdiag(K⁻¹)) were similar to those for the mean and sd of diag(K⁻¹)). For the simulated data, μ(offdiag(K⁻¹)) remained –0.001, regardless of w, α, and β. For α = 0, μ(diag(K⁻¹)) − μ(offdiag(K⁻¹)) was slightly higher compared with α = –0.05 and α = 0.05, indicating a better numerical condition for K⁻¹, since 11′ is completely singular.

Every covariance matrix K can be decomposed to SQS, where S² is a diagonal matrix of variances equal to diag(K), and Q is a correlation matrix. Adding α11′ + βI to K changes it to S*Q*S*, where ${(S^{*})}^{2} = S^{2} + (α + β) I$ . Positive α and negative β increase the absolute values in Q*. With the correlations getting closer to each other and to 1 or –1, the co-linearity in Q* increases and consequently K* becomes closer to singular. That would cause irregular and unstable distribution of the K⁻¹ elements. When K reaches singularity and becomes non-positive-definite, it turns non-invertible. Also, wK is equivalent to multiplying S by $\sqrt{w}$ .

The h² was estimated for the simulated data, before and after setting some pedigree information to missing (10% of sires and 5% of dams, for generations 1–10), and the estimates were 0.2910 and 0.2600, respectively. The reduction in the h² is expected as missing pedigree information makes A⁻¹ sparser and closer to an identity matrix. Table 3 shows the effects of w and β applied to K, on the h² estimate. There was no h² estimate for A, α = 0, and β = –0.05. The effect of α was minor and it did not change the h² estimate rounded to four decimal points. The h² estimate increased slightly by increasing β and largely by decreasing w. The estimated genetic variance changed 1/w times in response to w (results not shown). Multiplying K by w, is equivalent to multiplying the genetic variance by w. In response, the estimated genetic variance changed by 1/w. Similarly, increasing S to S* with a positive α + β is expected to increase the variances in K and reduce the estimated genetic variance and h². However, change of the correlation matrix Q to Q* worked in the opposite direction. Consequently, no h² change was observed by changing α, and the h² estimate was even increased by increasing β. The correlations (Q) are expected to increase by increasing α. On the other hand, a positive β is expected to bring correlations closer to 0, and a negative β is expected to deviate correlations from 0.

TABLE 3

TABLE 3. The effect of w and β (wK + α11′ + βI) on the heritability estimate.

The h² estimates were considerably lower for A than for G (for the wheat data). The results (Table 3) confirmed previous reports about the role of D_k or alternatively μ(diag(K)) − μ(offdiag(K)) in the genetic variance (Searle, 1982; Speed and Balding, 2015; Legarra, 2016). For A, μ(diag(K)) − μ(offdiag(K)) was considerably higher than for G. The higher genetic variance imposed by A compared with G was compensated by a lower estimate of genetic variance and h². Also, α did not make any change to μ(diag(K)) − μ(offdiag(K)), neither to the estimates of genetic variance and h². Given the magnitude of β in changing μ(diag(K)) − μ(offdiag(K)), it played a small role in changing the genetic variance and the h² estimate.

3.2 Properties of K⁻¹

We studied the distributions of A⁻¹ and G⁻¹ for their large h² difference, and the results are presented in Figure 1. Though the difference between the averages of diag(A⁻¹) and diag(G⁻¹) was small (Table 1), diag(A⁻¹) had a wider range (0.5–540.490 vs. 3.324–303.766) compared with diag(G⁻¹) (Figure 1). The averages of offdiag(A⁻¹) and offdiag(G⁻¹) were also similar (Table 2), but offdiag(A⁻¹) had a wider range (–500 to 2.936 vs. –172.679 to 26.056) compared with offdiag(G⁻¹) (Figure 1). Both A⁻¹ and G⁻¹ had off-diagonal elements concentrated around 0. Whereas, 97.96% of offdiag(A⁻¹) were between –0.02 and 0.02, it was 6.37% for offdiag(G⁻¹) (Figure 1).

FIGURE 1

FIGURE 1. Distribution of K⁻¹ elements for the wheat data.

Off-diagonal elements corresponding to max(diag(A⁻¹)) showed a lower mean and a higher sd compared with those for max(diag(G⁻¹)), –0.904 ± 20.477 vs. –0.500 ± 9.064. Diagonal elements corresponding to min(offdiag(K⁻¹)) were 590.490 and 500 for A⁻¹, and 303.766 and 243.530 for G⁻¹, all amongst the largest diagonal elements. Diagonal elements corresponding to max(offdiag(K⁻¹)) were moderate to large, 7.666 and 9.089 for A⁻¹, and 85.750 and 188.623 for G⁻¹.

To determine whether the lower estimates of genetic variance and h² for PBLUP are due to the large ranges of diag(A⁻¹) and offdiag(A⁻¹), or due to the high concentration of A⁻¹ elements (Figure 1), rows and columns of A⁻¹ that showed diagonal elements greater than max(diag(G⁻¹)) and off-diagonal elements less than min(offdiag(G⁻¹)) were discarded. Discarding 10 rows/columns of A⁻¹ and their corresponding phenotypes changed the range of diag(A⁻¹) to 0.5 and 259.224, and the range of offdiag(A⁻¹) to –166.667 and 2.936. The h² estimate changed from 0.3357 (Table 3) to 0.3474, which indicated that the high concentration of A⁻¹ elements is mainly responsible for the lower h² estimate for PBLUP compared with that for GBLUP.

The H⁻¹ (simulated data) had a range of –1.45 to 2.62 for the off-diagonal elements, and 1 to 6.90 for the diagonal elements. 96.9% of the off-diagonal elements were between –0.02 and 0.02. The distribution of diag(H⁻¹) is presented in Supplementary Figure S3. Six rows/columns of H⁻¹ corresponding to the largest diag(H⁻¹) (ranging from 6.45 to 6.90) were discarded to study the effect of a lower range and variation of diag(H⁻¹) on the h² estimate. That reduced max(diag(H⁻¹)) to 5.12. Also, μ(diag(H⁻¹)) − μ(offdiag(H⁻¹)) reduced from 2.808 to 2.801, but the range of offdiag(H⁻¹) did not change. The h² estimate increased from 0.2600 to 0.2637, which confirmed the results from the wheat data, where a lower variation in diag(K⁻¹) resulted in a higher h² estimate.

The same transformations applied to K (Eq. 2) were applied to the original K⁻¹ (i.e., inverted K with w = 1, α = 0, and β = 0), and the responses in the h² estimate and EBV distribution were measured. The effects of w, α, and β applied to K⁻¹ on the h² estimate are presented in Table 4. The h² estimate increased slightly by increasing w and largely by increasing β. The effect of α was insignificant. The h² estimates were similar for α = –0.05 and α = 0.05. For the simulated data, α = 0 resulted in a slightly higher h² estimate. For the wheat data and β = 0.05, α = 0 resulted in a slightly higher h² estimate, but it was the opposite when β = –0.05. For both K and K⁻¹, the increase of β resulted in a higher h² estimate, and as expected the effect of w applied to K⁻¹ on the h² estimate was the opposite to that of K. The constant α almost had no effect on the h² estimate when applied to K, and had a small effect on the h² estimate when applied to K⁻¹.

TABLE 4

TABLE 4. The effect of w, α, and β (wK⁻¹ + α11′ + βI) on the heritability estimate.

Correlation coefficients between EBV from different K⁻¹ and the original K⁻¹ (w = 1, α = 0, and β = 0) are presented in Table 5. The h² estimate from the original K⁻¹ was used for the EBV prediction in all the analyses, (1) to be able to distinguish between the effects of K⁻¹ and h² used in the mixed model equation, and (2) estimation of variance components is computationally intensive, and in routine genetic evaluations, usually, it is not affordable to update them regularly. Deviation of w from 1, and α and β from 0 reduced the EBV correlations. Differences in the correlations due to w and β were minor. Differences were also minor for α and the wheat data. However, deviation of α from 0 substantially reduced the correlations for the simulated data, which might be an indication of K⁻¹ further inflicted by singularity than for the wheat data.

TABLE 5

TABLE 5. The effect of w, α, and β (wK⁻¹ + α11′ + βI) on the correlation of predicted breeding values with those using the original K⁻¹ (w = 1, α = 0, and β = 0). The same heritability (based on the original K⁻¹) was used in all the analyses.

Regression coefficients of the EBVs using different K⁻¹ on the EBVs using the original K⁻¹ (w = 1, α = 0, and β = 0) are presented in Table 6. The h² estimate using the original K⁻¹ was used for the EBV prediction in all the analyses. Increasing w and β and deviation of α from 0 reduced the regression coefficient, as a result of inflated evaluations. Where the regression coefficient is greater than the correlation coefficient, it means that the variance of EBVs has increased, and vice versa. Generally, decreasing w and β resulted in increasing the variance of EBVs. Unless K⁻¹ has not inflicted by the singularity of α11′ (e.g., the simulated data), the role of α on changing the variance of EBVs was marginal.

TABLE 6

TABLE 6. The effect of w, α, and β (wK⁻¹ + α11′ + βI) on the regression coefficient of predicted breeding values on those using the original K⁻¹ (w = 1, α = 0, and β = 0). The same heritability (based on the original K⁻¹) was used in all the analyses.

For different w, α = 0, and β = 0, EBVs remained unchanged using the h² estimate from the same K⁻¹ rather than the original K⁻¹ ([Dataset] Nilforooshan, 2022b). This means that weighting K⁻¹ by a scalar, that scalar is captured by the variance component estimation, and using the estimated variance components from the same K⁻¹, EBVs remain unchanged. Matrix K⁻¹ can be decomposed to S⁻¹Q⁻¹S⁻¹. Multiplying w to K⁻¹ corresponds to dividing S by $\sqrt{w}$ . Adding α11′ + βI to K⁻¹ changes it to ${(S^{- 1})}^{*} {(Q^{- 1})}^{*} {(S^{- 1})}^{*}$ , where ${(S^{- 2})}^{*} = S^{- 2} + (α + β) I$ .

3.3 Factors defining the elements of K and K⁻¹

Different factors define the elements of A and G (similarly H, in which the genomic information in G is propagated to non-genotyped individuals). Diagonal elements of A are twice the probability of two random gametes from an individual carrying identical by descent alleles, and offdiag(A) are equal to the numerator of the coefficients of relationship between pairs of individuals (Wright, 1922). Diagonal elements of G increase as an individual’s homozygosity rate increases, further by homozygosity for rare alleles (VanRaden, 2008). Off-diagonal elements of G increase as the shared homozygosity rate (i.e., homozygosity at the same loci) between pairs of individuals increases, further by shared homozygosity for rare alleles. Following method 1 of VanRaden (2008), diagonal element of G for individual i, and the off-diagonal element of G between individuals i and j equal:

G_{i i} = \sum_{l = 1}^{n} {(M_{i l} - 2 p_{l} + 1)}^{2} / 2 \sum_{l = 1}^{n} p_{l} (1 - p_{l}), (3)

G_{i j} = \sum_{l = 1}^{n} (M_{i l} - 2 p_{l} + 1) (M_{j l} - 2 p_{l} + 1) / 2 \sum_{l = 1}^{n} p_{l} (1 - p_{l}), (4)

Where M_ij is the genotype (coded as –1, 0, 1) of individual i at locus l, and n is the total number of genotype markers.

Diagonal elements of A⁻¹ increase by the number of individual’s progeny, known mates, and known parents (Henderson, 1975b). Considering no parent-progeny mating for an easier explanation, positive offdiag(A⁻¹) are those between mates, which increase as the number of their progeny increases, and negative offdiag(A⁻¹) are those between parent and progeny. Individual i’s inbreeding coefficient adds to the elements of A⁻¹ corresponding to its progeny and to Aⁱⁱ (the diagonal element of A⁻¹ for individual i), and deducts from the elements corresponding to its mates (Nilforooshan, 2022a). Little is known about factors influencing elements of G⁻¹. We found a correlation of 0.091 between diag(A) and diag(A⁻¹), and –0.194 between diag(G) and diag(G⁻¹).

4 Conclusion

Different K and K⁻¹ result in different h² estimates and distributions of EBVs. We studied some distribution properties of K and K⁻¹ elements to learn about the properties of the two matrices influencing the h² estimate and the distribution of EBVs. Higher phenotypic variance and h² result in higher genetic variance and the variance of EBVs. Furthermore, the distribution of K and K⁻¹ elements are important as they directly and indirectly (via the estimated h²) influence the genetic variance and the variance of EBVs. Matrix K, which defines the genetic relatedness among individuals and the amount of inbreeding, underwent a combination of three transformations, each with three levels. Adding α11′ to K is equivalent to re-basing the base population to a former generation with a positive α (Chen et al., 2011; Vitezica et al., 2011), which is applied in the context of ssGBLUP for improving the compatibility between G and A. Note that 11′ is completely singin the ular, and an α largely deviated from 0 would bring K closer to singular. Similarly, adding βI to K with a negative β brings K closer to singular. Therefore, the conclusions of this study exclude odd observations that might occur because of K or K⁻¹ becoming singular or nearly singular. Though eigenvalues of K are direct indicators of K’s numerical condition, negative or high μ(diag(K⁻¹)), and/or very large sd(diag(K⁻¹)) and sd(offdiag(K⁻¹)) are indirect indicators of K’s ill numerical condition.

Depending on the sign of β, μ(diag(K)) − μ(offdiag(K)) or D_k is changed. A positive β means adding to inbreeding coefficients while keeping genetic covariances (i.e., offdiag(K)) unchanged. It also means adding unstructured variance to the structured genetic variance, which resulted in the increase of the genetic variance, the h² estimate, and the variance of EBVs. In inbred populations, offdiag(K) increase further than diag(K) by inbreeding. As a result of the reduced D_k, the genetic variance, and the variance of EBVs are expected to reduce. Weighting K by w, both the genetic variance and D_k are weighted. This weight is captured by the variance component estimation and in response to the genetic variance multiplied by w, 1/w of the genetic variance is estimated, which is equivalent to changing h² from 1/(1 + λ) to w/(w + λ). Similarly, weighting K⁻¹ by w, w times the genetic variance is estimated, which is equivalent to changing h² to 1/(1 + wλ). Applying the weighted K⁻¹ and its corresponding variance components in the form of h², EBVs remained unchanged. This shows the importance of using variance components corresponding to the K⁻¹ that is used in BLUP. Using improper or outdated variances is an important source of bias in genetic evaluations (Tsuruta et al., 2019). Variance component estimates need to be updated regularly, and the use of variance components estimated based on a type of K⁻¹ (e.g., A⁻¹) should be avoided in BLUP using other K⁻¹ (e.g., G⁻¹ and H⁻¹). This study showed very different h² estimates for the same population and trait, based on different K⁻¹ (between A⁻¹ and G⁻¹ for the wheat data, between transformed K, and between transformed K⁻¹).

In agreement with previous studies, we found D_k as a distribution parameter of K influencing the genetic variance. We studied μ(diag(K)) − μ(offdiag(K)) instead of D_k, however for large matrices like K (individual × individual), μ(K) is heavily defined by μ(offdiag(K)). Matrix K⁻¹ underwent the same transformations as matrix K. Coefficient α re-bases the matrix, coefficient β re-bases the diagonal elements, and coefficient w inversely weights the genetic variance and changes the mean and the variance of the matrix, further for the diagonal elements, since those have larger mean and variance. We found that, w and β applied to K⁻¹ influence the h² estimate and the distribution of EBVs. The effect of α was marginal for both K and K⁻¹, unless it inflicts them with singularity. We also found that lower variance of diag(K⁻¹) can result in higher h² and thus greater variance of EBVs. For the off-diagonal elements of K⁻¹, more important than the variation of the elements, was how they are concentrated around their mean. For example, the h² estimate by G⁻¹ was considerably greater than for A⁻¹, for the wheat data. Whereas the off-diagonal elements of G⁻¹ showed a smaller range and variation, those were more widely centered around 0 than the off-diagonal elements of A⁻¹.

Data availability statement

Publicly available datasets were analyzed in this study. The data can be found here: https://cran.r-project.org/package=BGLR, https://doi.org/10.17632/zd3xx54j62.3.

Author contributions

MN conceived the idea, wrote the programs, run the analyses, and wrote the manuscript. AR-F took an active role in writing and revising the manuscript.

Funding

This work was supported by the NZ Ministry for Primary Industries, SFF Futures Programme: Resilient Dairy-Innovative breeding for a sustainable dairy future [grant number PGP06-17006].

Conflict of interest

Author MN was employed by the company Livestock Improvement Corporation.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2022.1000228/full#supplementary-material

References

Aguilar, I., Misztal, I., Johnson, D. L., Legarra, A., Tsuruta, S., and Lawlor, T. J. (2010). Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of holstein final score. J. Dairy Sci. 93, 743–752. doi:10.3168/jds.2009-2730

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, C. Y., Misztal, I., Aguilar, I., Legarra, A., and Muir, W. M. (2011). Effect of different genomic relationship matrices on accuracy and scale. J. Anim. Sci. 89, 2673–2679. doi:10.2527/jas.2010-3555

PubMed Abstract | CrossRef Full Text | Google Scholar

Christensen, O. F., and Lund, M. S. (2010). Genomic prediction when some animals are not genotyped. Genet. Sel. Evol. 42, 2. doi:10.1186/1297-9686-42-2

PubMed Abstract | CrossRef Full Text | Google Scholar

de los Campos, G., and Perez Rodriguez, P. (2021). BGLR: Bayesian generalized linear regression. v1.1.0. [Dataset]. Available at: https://cran.r-project.org/package=BGLR (Accessed 07 01, 2022).

Google Scholar

Gilmour, A. R., Gogel, B. J., Cullis, B. R., Welham, S. J., and Thompson, R. (2015). ASReml user guide release 4.1. Hemel Hempstead, UK: VSN International Ltd.

Google Scholar

Henderson, C. R. (1975a). Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447. doi:10.2307/2529430

PubMed Abstract | CrossRef Full Text | Google Scholar

Henderson, C. R. (1975b). Rapid method for computing the inverse of a relationship matrix. J. Dairy Sci. 58, 1727–1730. doi:10.3168/jds.S0022-0302(75)84776-X

CrossRef Full Text | Google Scholar

Legarra, A. (2016). Comparing estimates of genetic variance across different relationship models. Theor. Popul. Biol. 107, 26–30. doi:10.1016/j.tpb.2015.08.005

PubMed Abstract | CrossRef Full Text | Google Scholar

Martini, J. W. R., Schrauf, M. F., Garcia-Baccino, C. A., Pimentel, E. C. G., Munilla, S., Rogberg-Muñoz, A., et al. (2018). The effect of the H⁻¹ scaling factors τ and ω on the structure of H in the single-step procedure. Genet. Sel. Evol. 50, 16. doi:10.1186/s12711-018-0386-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Misztal, I., Tsuruta, S., Lourenco, D., Aguilar, I., Legarra, A., and Vitezica, Z. G. (2015). Manual for BLUPF90 family of programs. Athens, GA, USA: University of Georgia.

Google Scholar

MiX99 Development Team (2016). MiX99: A software package for solving large mixed model equations. Jokioinen, Finland: Natural Resources Institute Finland (Luke). Release XI/2016.

Google Scholar

Nilforooshan, M. A., Zumbach, B., Jakobsen, J., Loberg, A., Jorjani, H., and Dürr, J. (2010). Validation of national genomic evaluations. Interbull Bull. 42, 56–61.

Google Scholar

Nilforooshan, M. A. (2020). Application of single-step gblup in New Zealand romney sheep. Anim. Prod. Sci. 60, 1136–1144. doi:10.1071/AN19315

CrossRef Full Text | Google Scholar

Nilforooshan, M. A., Garrick, D., and Harris, B. (2021). Alternative ways of computing the numerator relationship matrix. Front. Genet. 12, 655638. doi:10.3389/fgene.2021.655638

PubMed Abstract | CrossRef Full Text | Google Scholar

Nilforooshan, M. A. (2022a). A new computational approach to henderson’s method of computing the inverse of a numerator relationship matrix. Livest. Sci. 257, 104848. doi:10.1016/j.livsci.2022.104848

CrossRef Full Text | Google Scholar

Nilforooshan, M. A. (2022b). Code & Data – understanding factors influencing the estimated genetic variance and the distribution of breeding values, [Dataset]. Mendeley Data v3. doi:10.17632/zd3xx54j62.3

CrossRef Full Text | Google Scholar

Nilforooshan, M. A. (2022c). pedSimulate – an R package for simulating pedigree, genetic merit, phenotype, and genotype data. Rev. Bras. Zootec. 51, e20210131. doi:10.37496/rbz5120210131

CrossRef Full Text | Google Scholar

Searle, S. R. (1982). Matrix algebra useful for statistics. Hoboken, NJ: Wiley.

Google Scholar

Simeone, R., Misztal, I., Aguilar, I., and Legarra, A. (2011). Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population. J. Anim. Breed. Genet. 128, 386–393. doi:10.1111/j.1439-0388.2011.00926.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Speed, D., and Balding, D. J. (2015). Relatedness in the post-genomic era: is it still useful? Nat. Rev. Genet. 16, 33–44. doi:10.1038/nrg3821

PubMed Abstract | CrossRef Full Text | Google Scholar

Tsuruta, S., Lourenco, D. A. L., Masuda, Y., Misztal, I., and Lawlor, T. J. (2019). Controlling bias in genomic breeding values for young genotyped bulls. J. Dairy Sci. 102, 9956–9970. doi:10.3168/jds.2019-16789

PubMed Abstract | CrossRef Full Text | Google Scholar

VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423. doi:10.3168/jds.2007-0980

PubMed Abstract | CrossRef Full Text | Google Scholar

Vitezica, Z. G., Aguilar, I., Misztal, I., and Legarra, A. (2011). Bias in genomic predictions for populations under selection. Genet. Res. 93, 357–366. doi:10.1017/S001667231100022X

PubMed Abstract | CrossRef Full Text | Google Scholar

Wright, S. (1922). Coefficients of inbreeding and relationship. Am. Nat. 56, 330–338. doi:10.1086/279872

CrossRef Full Text | Google Scholar

Keywords: BLUP, breeding values, diagonal elements, genetic variance, heritability, relationship matrix

Citation: Nilforooshan MA and Ruíz-Flores A (2022) Understanding factors influencing the estimated genetic variance and the distribution of breeding values. Front. Genet. 13:1000228. doi: 10.3389/fgene.2022.1000228

Received: 21 July 2022; Accepted: 26 September 2022;
Published: 13 October 2022.

Edited by:

José M. Álvarez-Castro, University of Santiago de Compostela, Spain

Reviewed by:

Shogo Tsuruta, University of Georgia, United States
Rostam Abdollahi-Arpanahi, University of Georgia, United States

Copyright © 2022 Nilforooshan and Ruíz-Flores. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Mohammad Ali Nilforooshan, bW9oYW1tYWQubmlsZm9yb29zaGFuQGxpYy5jby5ueg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Understanding factors influencing the estimated genetic variance and the distribution of breeding values

1 Introduction