Single-step genomic BLUP with many metafounders

Kudinov, Andrei A.; Koivula, Minna; Aamand, Gert P.; Strandén, Ismo; Mäntysaari, Esa A.

doi:10.3389/fgene.2022.1012205

ORIGINAL RESEARCH article

Front. Genet. , 21 November 2022

Sec. Livestock Genomics

Volume 13 - 2022 | https://doi.org/10.3389/fgene.2022.1012205

This article is part of the Research Topic Insights in Livestock Genomics: 2022 View all 9 articles

Single-step genomic BLUP with many metafounders

Gert P. Aamand²

¹Natural Resources Institute Finland (Luke), Jokioinen, Finland
²Nordic Cattle Genetic Evaluation, Aarhus, Denmark

Single-step genomic BLUP (ssGBLUP) model for routine genomic prediction of breeding values is developed intensively for many dairy cattle populations. Compatibility between the genomic (G) and the pedigree (A) relationship matrices remains an important challenge required in ssGBLUP. The compatibility relates to the amount of missing pedigree information. There are two prevailing approaches to account for the incomplete pedigree information: unknown parent groups (UPG) and metafounders (MF). unknown parent groups have been used routinely in pedigree-based evaluations to account for the differences in genetic level between groups of animals with missing parents. The MF approach is an extension of the UPG approach. The MF approach defines MF which are related pseudo-individuals. The MF approach needs a Γ matrix of the size number of MF to describe relationships between MF. The UPG and MF can be the same. However, the challenge in the MF approach is the estimation of Γ having many MF, typically needed in dairy cattle. In our study, we present an approach to fit the same amount of MF as UPG in ssGBLUP with Woodbury matrix identity (ssGTBLUP). We used 305-day milk, protein, and fat yield data from the DFS (Denmark, Finland, Sweden) Red Dairy cattle population. The pedigree had more than 6 million animals of which 207,475 were genotyped. We constructed the preliminary gamma matrix (Γ_pre) with 29 MF which was expanded to 148 MF by a covariance function (Γ₁₄₈). The quality of the extrapolation of the Γ_pre matrix was studied by comparing average off-diagonal elements between breed groups. On average relationships among MF in $Γ_{148}$ were 1.8% higher than in Γ_pre. The use of Γ₁₄₈ increased the correlation between the G and A matrices by 0.13 and 0.11 for the diagonal and off-diagonal elements, respectively. [G]EBV were predicted using the ssGTBLUP and Pedigree-BLUP models with the MF and UPG. The prediction reliabilities were slightly higher for the ssGTBLUP model using MF than UPG. The ssGBLUP MF model showed less overprediction compared to other models.

1 Introduction

Genomic prediction in dairy cattle started in 2009 for US Holsteins, Jersey, and Brown Swiss (Wiggans et al., 2017). Since then, most dairy populations publish genomic estimated breeding values (GEBV) using a multi-step approach (Masuda et al., 2022). The term “multi” stands for a cascade of steps used to obtain GEBV: calculation of pseudo-observations for genotyped proven bulls and cows, estimation of SNP effects, prediction of direct genomic values, and blending of genomic values with pedigree index (Wiggans et al., 2011). In contrast, a single-step genomic BLUP (ssGBLUP) model accounts for pedigree, phenotypic, and genomic data simultaneously to obtain GEBVs for all animals (Legarra et al., 2009; Aguilar et al., 2010; Christensen and Lund, 2010). Despite the preselection bias in the multi-step GEBV (Patry and Ducrocq, 2011) and the benefits of the single-step model (Legarra et al., 2014), the latter is used only for a few dairy populations (Mäntysaari et al., 2020; Misztal et al., 2020; Masuda et al., 2022). High computational load, compatibility challenges for the genomic and the pedigree relationship matrices, and improper accounting of unknown parents impede the wide implementation of the single-step approach (Mäntysaari et al., 2020). In dairy cattle, these problems can be expected to be amplified due to the many generations in pedigrees, intensive selection, and the vast exchange of breeding material between populations.

The original ssGBLUP requires the inverse matrices of A₂₂ and G in the inverted joint relationship matrix H⁻¹ (Aguilar et al., 2010; Christensen and Lund, 2010) where A₂₂ is the pedigree relationship matrix of the genotyped animals and G is the genomic relationship matrix. When the number of genotyped animals in the G matrix (n) exceeds the number of markers (m), direct inversion of the G matrix is not possible without regularization such as adding a small value to the diagonal or a residual polygenic matrix (Mäntysaari et al., 2017). When n >> m, the single-step method becomes computationally challenging. Several computational approaches have been proposed for the computation of G⁻¹ to allow feasible application of ssGBLUP for large datasets (see review by Misztal et al., 2020). For instance, the method called ssGTBLUP (Mäntysaari et al., 2017) uses the relationship matrix of genotyped animals (A₂₂) as the regularization matrix to avoid singularity, the Woodbury matrix identity for the G inverse, and a sparse presentation of the A₂₂⁻¹ to solve the computational challenges. In the data set with 178K genotyped animals to obtain GEBV, the ssGTBLUP model used 33% of the memory and 55% of the wall-clock time needed by the original ssGBLUP (Koivula et al., 2021a).

The difference in average off- and diagonal elements of A₂₂ and G matrices is known as a single-step compatibility issue (Vitezica et al., 2011). To balance the matrices implies adjusting either the pedigree or the genomic relationship matrix to make the matrices more similar. The concept of the A adjustment was suggested by Christensen (2012) and further developed into the metafounder (MF) approach (Legarra et al., 2015). Metafounders are related inbreed pseudo-individuals that are used as unknown parents in the pedigree. Relationships between MF are described by a covariance matrix (Γ), which is used to build a relationship matrix A^Γ. Estimation of Γ can be based on estimates of base allele frequencies (AF) for each MF (Garcia-Baccino et al., 2017). An important assumption of the MF approach is that the G matrix is constructed with all AF equal to 0.5 (Legarra et al., 2015). Applicability of MF has been shown in livestock (Koivula et al., 2021b, 2022), sheep (Granado-Tajada et al., 2020), and pig (Xiang et al., 2017) data sets. The MF approach was also reported as a perfect choice for multi-breed evaluations in case computation of accurate Γ is possible (Poulsen et al., 2022).

The number of MF in the reported studies on ssGBLUP in the large dairy cattle breeds has nearly always been less than the number of unknown parent groups (UPG). Allocation of few MF by breed or by breed by time help to achieve an accurate estimation of Γ due to the even distribution of MF across genotyped animals (Kudinov et al., 2020; Masuda et al., 2021). When MF are used in ssGBLUP, it would be natural to use the same number of MF as there are UPG in the pedigree-based animal model (PBLUP). However, accurate estimation of an unstructured Γ matrix of large size is difficult, especially if some of the UPG groups have no descendants among the genotyped animals or genotyped individuals are several generations away.

The aim of this study was to propose an approach to construct Γ with the same number of MF as routinely defined UPG. The proposed approach was applied to the Red Dairy Cattle 305-day data and pedigree used for the milk production evaluation in Nordic countries (Denmark, Finland, and Sweden). Both PBLUP and ssGTBLUP models were used. The predictions used either UPG or MF in equal numbers. Thus, the predictive performance of four models was investigated.

2 Materials and methods

2.1 Data

Data were 305-day milk, protein, and fat yield records from three lactations of Nordic Red Dairy Cattle (RDC), Finnish Holstein (HOL), and Finncattle (FIC) cows. Records were from January 1988 to June 2021. The total number of records by trait were: 9.45, 8.99, and 8.98 million for milk, protein, and fat, respectively (Table 1). Pedigree included 6.05 million cows and 118,363 bulls, of which 8,427 were RDC and 278 were FIC proven bulls. Genetic groups were defined as breed x country x five- or 10-year period for RDC, and as breed x five- or 10-year period for HOL, FIC, and other breeds. In total, there were 148 groups: 61 RDC, 45 HOL, 16 FIC, and 26 for breed group OTHER. The group OTHER included 23 breeds majorly beef cattle.

TABLE 1

TABLE 1. Number of records by lactation, trait, and breed in 305-day Nordic (Denmark, Finland, Sweden) Red Dairy cattle production data.

Genomic data were used from 206,140 RDC animals (6,018 proven bulls and 85,142 cows with records) and 1,335 FIC animals (160 proven bulls and 845 cows with records). Before 2019 the bulls were genotyped with Illumina Bovine SNP50 array and most cows with Illumina Bovine LD array (Illumina, San Diego, CA, USA). Since 2019 both bulls and cows were genotyped with Eurogenomics EG MD array (https://www.eurogenomics.com/). Quality control and imputation of genotypes to 46,914 SNPs were performed by NAV (Nordic Genetic Evaluation, Denmark). Genomic markers were not filtered on minor allele frequency and no edits were done concerning across and within breeds polymorphism. HOL genotypes were not presented in the current study.

2.2 Statistical models

Four prediction models were investigated using a multi-trait multi-lactation model: single-step GTBLUP with UPG in H⁻¹ (ssUPG), single-step GTBLUP with MF (ssMF), pedigree-based BLUP with UPG in A⁻¹ (pUPG), and pedigree-based BLUP with MF (pMF). The traits were milk, protein, and fat yield in three lactations i.e. - nine traits total. The linear mixed effects model was:

y = X b + Z u + e,

where y is the vector of phenotypes, X is the design matrix relating fixed effects to the phenotypes, b is the vector of fixed effects, Z is the design matrix relating the breeding values to the phenotypes, $u \sim N (0, A σ_{u}^{2})$ is the vector of random animal breeding values, and $e \sim N (0, I σ_{e}^{2})$ is the residual vector. Matrix A is the pedigree-based relationship matrix, $I$ is an identity matrix, $σ_{u}^{2}$ and $σ_{e}^{2}$ are genetic and residual variances, respectively. Fixed effects in b were calving year by season, calving age, herd by year, and calving age by breed. Calving age by breed effect consists of linear (α), quadratic (α²), and cubic (α³) regression coefficients of calving age multiplied by pedigree-based breed proportions of an animal (Lidauer et al., 2015) so that the general level of breed remained to be modeled by u. The regression coefficients were centered over all data to zero according to mean calving age as $α = (c a l v i n g a g e - c a l v i n g a g e) / 365$ .

2.2.1 Single-step GTBLUP

The mixed model equations (MME) of the original ssGBLUP model require the inverse of a joint relationship matrix H⁻¹ (Aguilar et al., 2010; Christensen and Lund, 2010):

H^{- 1} = A^{- 1} + (\begin{array}{c} 0 & 0 \\ 0 & G^{- 1} - A_{22}^{- 1} \end{array}),

where, A₂₂ is the part in A for the genotyped animals, and G is the genomic relationship matrix (VanRaden, 2008). Regularization matrix C = wA₂₂ was added to the marker-based matrix G, where w is the residual polygenic proportion, i.e., the genomic relationship matrix was G_{c_w} = (1-w)G+ wA₂₂ (Mäntysaari et al., 2017). We used w equal to 30% to keep the comparability to the studies by Koivula et al. (2021b, 2022). The G_{c_w} matrix was constructed with the assumption that AF of all markers was equal to 0.5. Thus, G_{c_w} = (1-w)Z₁₀₁Z₁₀₁´/k + wA₂₂ where k = m/2 is the scaling factor, m is the number markers, and Z₁₀₁ is the matrix of genotype counts with values of 0 for the heterozygote and values -1 and +1 for homozygotes. The inverse genomic relationship matrix can be expressed as (Mäntysaari et al., 2017) $G_{c_w}^{- 1} = \frac{1}{w} A_{22}^{- 1} - T_{w}^{'} T_{w}$ where $T_{w} = \frac{1}{w} L_{w}^{- 1} Z_{101}^{'} \sqrt{\frac{2}{m}} A_{22}^{- 1}$ and L_w is the Cholesky decomposition of $\frac{1}{w} Z_{101}^{'} A_{22}^{- 1} Z_{101} \frac{2}{m} + \frac{1}{1 - w} I$ .

2.2.2 Single-step GTBLUP with UPG

The joint relationship matrix augmented by UPG (Quaas and Pollak, 1981; Misztal et al., 2013; Matilainen et al., 2018) was computed as shown in Koivula et al. (2021a):

H^{- 1} = A_{U P G}^{- 1} + (\begin{array}{c} 0 & 0 & 0 \\ 0 & B_{11} & B_{12} \\ 0 & B_{21} & B_{22} \end{array})

where

A_{U P G}^{- 1} = (\begin{array}{c} A^{11} & A^{12} & - (A^{11} Q_{1} + A^{12} Q_{2}) \\ A^{21} & A^{22} & - (A^{21} Q_{1} + A^{22} Q_{2}) \\ - (Q_{1}^{'} A^{11} + {Q_{2}^{'} A}^{21}) & - (Q_{1}^{'} A^{12} + {Q_{2}^{'} A}^{22}) & Q^{'} A^{- 1} Q \end{array})

Q = (\begin{array}{c} Q_{1} \\ Q_{2} \end{array}),

and

B = \frac{1 - w}{w} (\begin{array}{c} A_{22}^{- 1} & {- A}_{22}^{- 1} Q_{2} \\ {- Q}_{2}^{'} A_{22}^{- 1} & Q_{2}^{'} A_{22}^{- 1} Q_{2} \end{array}) - (\begin{array}{c} T_{w}^{'} T_{w} & {- T}_{w}^{'} T_{w} Q_{2} \\ - {Q_{2}^{'} T}_{w}^{'} T_{w} & {Q_{2}^{'} T}_{w}^{'} T_{w} Q_{2} \end{array})

The Q matrix has proportions of genes contributed from each UPG according to pedigree information. The subscripts 1 and 2 in Q pertain to genotyped and non-genotyped animals. Subscripts 1 and 2 in B pertain to genotyped animals and UPGs, respectively. The UPGs were modeled as random effect. Inbreeding coefficients were accounted in both pedigree-based relationship matrices.

2.2.3 Single-step GTBLUP with MF

In the MF approach (Christensen, 2012; Legarra et al., 2015), the H⁻¹ matrix was replaced by:

{(H^{Γ})}^{- 1} = {(A^{Γ})}^{- 1} + (\begin{array}{c} 0 & 0 \\ 0 & G_{c_w}^{- 1} - {(A_{22}^{Γ})}^{- 1} \end{array}),

where G_{c_w}=(1-w)G+ w $A_{22}^{Γ}$ , A^Γ is pedigree relationship matrix formed with a Γ matrix, $A_{22}^{Γ}$ is the submatrix of A^Γ for the genotyped animals, and Γ was variance covariance matrix of the MF. Inbreeding coefficients estimated using the Γ matrix were used in the inverses of ${(A^{Γ})}^{- 1}$ and ${(A_{22}^{Γ})}^{- 1}$ .

2.2.4 Pedigree BLUP

The pedigree-based models (pUPG and pMF) were similar to their corresponding single-step models, except that the genomic data was excluded from the prediction. In pUPG model UPGs were accounted in A⁻¹ ( $A_{U P G}^{- 1}$ ; Quaas and Pollak, 1981). In pMF model the A⁻¹ was replaced by ${(A^{Γ})}^{- 1}$ .

2.5 Estimation of the Γ matrix

Let the number of MF be r such that the Γ matrix has size r. In the MF approach, the Γ matrix describes the variance-covariance structure of MF. It can be estimated by $8 C o v (P)$ , where P is an m by r matrix of estimated base population AF for each marker and MF (Legarra et al., 2015; Garcia-Baccino et al., 2017). In the studied data set, 44% of the UPG were not linked to the genotypes. Thus, using the 148 UPG as MF in the estimation of P was not feasible. To compute Γ for a large number of MF, the following general steps were used:

a) Estimate allele frequencies for a set of base groups;

b) From estimated allele frequencies, calculate the preliminary Γ matrix (Γ_pre) for the base groups;

c) Solve the matrix K in the covariance function Γ_pre= $Ф_{p r e} K Ф_{p r e}^{'}$ + E using Γ_pre and the model matrix Φ_pre; The matrix E is null if row rank of $Ф_{p r e}$ is equal to dimension of Γ_pre and, if not E represents least squares errors of estimation.

d) Compute the Γ for the large number of groups as $Ф_{Γ} K Ф_{Γ}^{'}$ .

The model matrices $Ф_{p r e}$ and $Ф_{Γ}$ define linear model by group and time for the set of base MF and for all MF, respectively.

The technical detailed steps used to compute Γ for 148 groups (Γ₁₄₈) were:

a) The pedigree was pruned to include only one ancestor generation of genotyped animals as in Kudinov et al. (2020). Truncation of the pedigree helped to achieve equal distribution of the genomic information over UPGs. Missing parents in the truncated pedigree were replaced by 26 groups formed by breed, country, and time interval (Table 2). All HOL ancestors were assigned to the same group regardless of country and time. Estimation of base population AF for each of the groups (P_RDC) was performed using the GLS method (McPeek et al., 2004; Garcia-Baccino et al., 2017). The HOL group estimated from RDC genotypes was dropped from P_RDC. The HOL AF (P_HOL) were the same as used in Kudinov et al. (2020)—calculated using Holstein genotypes (M. Koivula, personal communication). The joint P_{RDC_HOL} matrix of size 29 by 45,823 was created by merging compatible SNPs in P_RDC and P_HOL. Number of SNPs dropped from P_RDC and P_HOL where 1,091 and 519, respectively.

b) Three Γ matrices (Γ_{RDC_HOL}, Γ_RDC, and Γ_HOL) were computed using P_{RDC_HOL}, P_RDC, and P_HOL. A pre-Γ matrix (Γ_pre, Figure 1) was created by replacing the diagonal elements of Γ_{RDC_HOL} by diagonal elements of Γ_RDC and Γ_HOL at corresponding places. The diagonal values in Γ_pre were larger than in Γ_{RDC_HOL}.

c) Structure of Γ_pre was computed with covariance function $Ф_{p r e} K Ф_{p r e}^{'}$ (Kirkpatrick et al., 1990), where Φ_prewas a model matrix having standardized year of the MF (Appendix 1) and K was a matrix of co-variance function coefficients. Year standardization was done using formula $\frac{2 ({y e a r}_{M F} - {y e a r}_{\min})}{{y e a r}_{\max} - {y e a r}_{\min}} - 1$ , where year_MF is a year of the MF, year_min and year_max are the first (1950) and the last (2021) year points among the 148 groups in the pedigree.

TABLE 2

TABLE 2. Groups used to compute preliminary Γ matrix.

FIGURE 1

FIGURE 1. Symmetrical covariance matrix between 29 MFs (Γ_pre). Lower triangle present diagonal (MFs self-relationships) and off-diagonal (between MFs relationships) elements in Γ_pre $,$ upper triangle—heatmap plot of the off-diagonals.

Matrix K was estimated as (Tijani et al., 1999):

{\hat{K} = (Ф_{p r e}^{'} Ф_{p r e})}^{- 1} {Ф_{p r e}^{'} Γ_{p r e} Ф}_{p r e} {(Ф_{p r e}^{'} Ф_{p r e})}^{- 1}

leading to estimate

\hat{K} = [\begin{array}{c} 0.0108 & 0.0093 & 0.0083 & - 0.0042 & - 0.0004 & 0.0003 & - 0.0058 & - 0.0054 & 0.0046 \\ 0.6563 & 0.6203 & 0.5551 & 0.5961 & 0.5850 & 0.5511 & 0.5434 & 0.5158 \\ 0.6349 & 0.5621 & 0.6094 & 0.5795 & 0.5434 & 0.5415 & 0.5164 \\ 0.6098 & 0.5544 & 0.5644 & 0.5467 & 0.5778 & 0.5358 \\ 0.6535 & 0.5678 & 0.5441 & 0.5375 & 0.5159 \\ 0.5899 & 0.5442 & 0.5458 & 0.5325 \\ s y m & 0.6666 & 0.5530 & 0.5121 \\ 0.6946 & 0.5205 \\ 0.6053 \end{array}]

d) Finally, the Γ₁₄₈ was estimated as $Ф_{148} K Ф_{148}^{'}$ , where the matrix Φ₁₄₈ was designed same way as Φ_pre but using all the MF. Rank of the Γ₁₄₈ matrix is only 9. To avoid A^Γ matrix singularity we reduced the off-diagonal values of Γ₁₄₈ by 2.5% and increased the diagonal values by 2.5%

2.6 Validation of model fit

Validation of the prediction models was done using modified forward prediction (Mäntysaari et al., 2010). For the validation a reduced phenotypic data set was constructed by removing records from the last 4 years of data, i.e., June 2017 to June 2021. Daughter yield deviations (DYD) for bulls and yield deviations (YD) for cows were computed using the full data set using the same model which was applied to reduced data. Bias of evaluation was estimated by the linear regression coefficient (b₁) from the weighted regression of DYD/YD on the corresponding [G]EBV predicted with the reduced data. The weight of DYD for bull i was EDC_i/(EDC_i + λ_b), where λ_b is (4—h²)/h², h² is heritability of the trait, and EDC_i is the effective daughter contributions of bull i computed as in Taskinen et al. (2014). Weight for cow YD_j was computed as ERC_j/(ERC_j + λ_c), where λ_c is (1-h²)/h² and ERC_j is the effective record contribution of cow j (Přibyl et al., 2013). Adjusted validation reliability was attained by dividing the coefficient of determination from the regression model ( $R^{2}$ ) by the average weight of DYD ( $R_{E D C}^{2}$ ) and YD ( $R_{E R C}^{2}$ ) for bulls and cows, respectively. Average genetic trends were plotted using the trait specific combined [G]EBVs computed as

{[G] E B V}_{p a r i t y 1} * 0.30 + {[G] E B V}_{p a r i t y 2} * 0.25 + {[G] E B V}_{p a r i t y 3} * 0.45

(https://nordicebv.info/wp-content/uploads/2021/10/NAV-routine-genetic-evaluation_EDITYSS-08102021.pdf).

2.7 Software

Pedigree truncation and estimation of the inbreeding coefficients was done using RelaX2 v.1.95 software. The AF were estimated using Bpop v. 0.98 program (Strandén and Vuori, 2006; Strandén and Mäntysaari, 2020), T matrix and its diagonal needed in the ssGTBLUP model were computed using hgtinv v.0.83 program. The computation of [G]EBV predictions and the estimation of EDC/ERC used MiX99 software (Strandén and Lidauer, 1999). MiX99 software uses preconditioned conjugate gradient (PCG) iteration. The PCG method was assumed to be converged when convergency criteria <1e-6 was achieved. Convergency criteria was defined as a Euclidean norm of the difference between the right-hand side (RHS) of the MME and the one predicted by the current solutions relative to the norm of RHS. The matrices ${(A_{22})}^{- 1}$ and ${(A_{22}^{Γ})}^{- 1}$ used by MiX99 and hginv were constructed using the given pedigree and inbreeding files, and in case of MF, by file with the Γ⁻¹ matrix.

3 Results and discussion

3.1 Relationship matrices

Elements of Γ_pre ranged from 0.59 to 0.74 and from 0.51 to 0.69 for the diagonal and off-diagonal elements, respectively. The lowest and highest diagonal values (self-relationship, Legarra et al., 2015) were in groups HOL 1960 and RDC FIN 1990, respectively. In the Γ₁₄₈ matrix, diagonal elements were in a range from 0.61 to 0.73 (Figure 2). The lowest and highest self-relationships were in HOL SWE 1970 and OTHER 1960 groups, respectively. The off-diagonal elements of Γ₁₄₈ ranged from 0.48 to 0.69. The highest average relationships were observed between the FIN and SWE RDC groups, as expected. Relationships between HOL and RDC DNK were higher than with the other RDC groups due to the larger proportion of HOL sires in the RDC DNK pedigree. Similarly, the FIC groups were genetically closer to RDC FIN than to the other groups due to historical crossbreeding. Relationship coefficients between the RDC subgroups in our study ranged from 0.54 to 0.65 which was much higher than the range 0.09–0.18 presented between the biological types of Montana cattle breed (Kluska et al., 2021). Average relationships between RDC and HOL breed (0.52) was close to presented between HOL and Jersey breeds (0.48, Legarra et al., 2015).

FIGURE 2

FIGURE 2. Heatmap of covariances between 148 MFs (Γ₁₄₈). Diagonal of the heatmap plot are self-relationships of the MFs; off-diagonals are relationships between MFs.

Because Γ₁₄₈ is an extrapolated matrix of Γ_pre we expect these to be alike. The difference between the two matrices was assessed using percentage deviation from the mean off-diagonal values in breed groups (Table 3). The average off-diagonal value of $Γ_{148}$ was 1.8% higher than in Γ_pre. For instance, the average relationships between the RDC FIN and FIC groups were 0.54 and 0.56 in Γ_pre and Γ₁₄₈, respectively. Thus, the covariance function allows to extrapolate the Γ matrix for the MF approach in order to have the same number of MF as UPG.

TABLE 3

TABLE 3. Deviation from average relationships between breed groups in Γ_pre (lower triangle) and Γ₁₄₈ (upper triangle).

Application of the Γ₁₄₈ matrix to the pedigree-based relationship matrix lifted the average diagonal elements of A₂₂ closer to G₀₅ (Figure 3). The smallest diagonal and off-diagonal values of A₂₂ increased by 0.25 (from one to 1.25) and by 0.50 (from 0 to 0.50), respectively, by using $Γ_{148}$ as the basis for $A_{22}^{Γ}$ (Table 4). The increase was close to that in Kudinov et al. (2020) - 0.27 and 0.48 for the diagonal and off-diagonal elements, respectively. The correlations in the diagonal and off-diagonal elements were higher between G₀₅ and $A_{22}^{Γ}$ (0.70 and 0.88) than between G₀₅ and A₂₂ (0.57 and 0.77). The overall magnitude of values in $A_{22}^{Γ}$ in our study was higher than presented for HOL in Koivula et al. (2022). Average relationship coefficients of A₂₂, $A_{22}^{Γ}$ , and G₀₅ in Koivula et al. (2022) had a steady increase by animal’s birth year. However, similar behavior in our study was observed only for the A₂₂ matrix. A slight decrease in the average relationship coefficient of G₀₅ and $A_{22}^{Γ}$ was observed after year 2000. This might be caused by the establishment of the joint Nordic RDC evaluation and admixture of the breads in the three populations. The total increase in the average relationships in the 40-year period were 2.81%, 0.97%, and 0.72% for A₂₂, $A_{22}^{Γ}$ , and G₀₅, respectively. The MF approach is beneficial in an admixed population such as Nordic RDC, as it helps to balance the G₀₅ and A₂₂ matrices. However, $A_{22}^{Γ}$ and G₀₅ were not on exactly on the same scale as the mean diagonal and off-diagonal elements in $A_{22}^{Γ}$ were still somewhat lower than in G₀₅ (Table 4). So, some compatibility issues between the pedigree- and genomic-based relationship matrices remained.

FIGURE 3

FIGURE 3. Average diagonal elements of A₂₂ (black circles), A^Γ (blue triangles), and G₀₅ (red diamonds) by the birth year of a genotyped animal.

TABLE 4

TABLE 4. Mean, minimum (Min), and maximum (Max) element values of A₂₂, $A_{22}^{Γ}$ and G₀₅ from diagonal and off-diagonal.

In addition to Γ_pre, Γ_{RDC_HOL} was tested as source for Γ_148* and corresponding $A_{22}^{Γ *}$ estimation. The mean difference between Γ_{RDC_HOL} and Γ_pre was 0.03. Diagonal elements in $A_{22}^{Γ *}$ constructed using extrapolated Γ_{RDC_HOL} were on average 0.02 lower than in $A_{22}^{Γ}$ used for genomic prediction. Even though construction of Γ_pre with Γ_RDC and Γ_HOL diagonals helped to lift A₂₂ closer to G₀₅, this step was not vital and Γ_{RDC_HOL} might have been used as it is.

Filtering of the SNPs by minor allele frequency (MAF) for the $Γ$ matrix estimation was elaborated in our previous study (Kudinov et al., 2020), and indirectly performed in Legarra et al. (2015). In the current study, we avoided MAF filtering of SNPs used to compute $Γ_{p r e}$ . That helped to compute a $A_{22}^{Γ}$ matrix closer to G₀₅, as the same set of markers was used to construct G₀₅. We observed that if selection of SNPs is used, it should be applied to both $Γ$ and G₀₅, i.e., the same set of markers should be used consistently. It is reasonable to keep the set of markers used in G₀₅ and $Γ$ as compatible as possible.

The presented approach in our study allows to fit the same number of MF as UPG and define MF for base population groups not linked to the genotypes. However, approach requires several arbitrary steps that need to be customized for each population. For instance, definition of the groups will be different in Γ_pre. We defined the groups in Γ_pre by breed, country, and time. If any of defined groups had less than 0.1% of genotyped animals, we have had to combine it. In our study, the definition of the time variable in the base populations used to compute AF was the last year of the time interval, another way is to use mean, median or the first year. Because the year definition in each of the groups is used in the model matrix Φ and resulting covariance function, average diagonal of $A_{22}^{Γ}$ would expectedly decrease. The standardized year of the MF in Φ was computed with the same formulae. However, this can be adjusted for specific breed or country. Use of a covariance function in routine genomic prediction need re-estimation of the Γ matrix when new genetic groups are defined, but not re-estimation of base AF.

3.2 Model runs and validation

The ssMF and ssUPG models converged in 1,388 and 2,802 iterations. The wall-clock time per iteration was similar for ssMF and ssUPG models. The mix99 runtime for ssMF model in Intel Xeon 2.8 Ghz machine with four cores was 11 h 19 min.

Table 5 presents bull validation results for the nine traits. For all traits, the highest prediction reliability was obtained by the ssMF model. The regression slopes ( $b_{1}$ ) obtained by ssMF were slightly higher than by ssUPG. Prediction reliability by the pUPG and pMF models were the same. However, the slopes ( $b_{1}$ ) with pMF were closer to one than with pUPG. In all traits and models, quality of prediction decreased from lactation one to 3. For the validation cows (Table 6), the ssMF model gave slightly better validation reliability in milk than the ssUPG model. For protein and fat, the same prediction reliability was estimated in both single-step models. The slope in ssMF was closer to one than in the other models. However, in the first parity milk trait, the slope was slightly above one in ssUPG and ssMF. The MF approach improved quality of genomic prediction in the studied population similarly as reported in Bradford et al. (2019), Masuda et al. (2021), and Koivula et al. (2022) The bias of prediction in the ssMF model was lower as reported for dairy sheep (Macedo et al., 2020). Nonetheless bias in our study remain significant in all single-step models.

TABLE 5

TABLE 5. [G]EBV validation test regression coefficients (b₁) and weighted validation reliabilities ( $R_{E D C}^{2}$ ) for RDC validation bulls.

TABLE 6

TABLE 6. [G]EBV validation test regression coefficients (b₁) and weighted validation reliabilities ( $R_{E R C}^{2}$ ) for RDC validation cows.

Genetic trends for combined milk, fat, and protein GEBVs are presented for the genotyped bulls with at least 50 daughters and all RDC cows in Figures 4, 5, respectively. Average GEBV were centered using the mean GEBV of RDC cows born in 2007. Both genomic and non-genomic models had similar shape in the UPG and MF instance. The average GEBV levels were higher than the average EBV levels. Similar difference has been observed in other single-step studies (Ma et al., 2015; Silva et al., 2019; Koivula et al., 2022). Overprediction in single-step with MF was reduced in our study similar to reported in Masuda et al. (2021) and Koivula et al. (2022).

FIGURE 4

FIGURE 4. Average [genomic] breeding value of bulls by birth year in 305-d milk, protein, and fat yield (kg). Each bull had at least 50 daughters. Solid and dashed lines are from the model runs with full and reduced (minus four production year) data. Models are ssUPG—single-step GTBLUP with UPG accounted in H⁻¹ (blue lines), ssMF—single-step GTBLUP with MF (black lines); pUPG-pedigree BLUP with UPG accounted in A⁻¹ (green lines), and pMF—pedigree BLUP with MF (red lines).

FIGURE 5

FIGURE 5. Average [genomic] breeding value of cows by birth year in 305-d milk, protein, and fat yield (kg). Solid and dashed lines are from the model runs with full and reduced (minus four production year) data. Models are ssUPG—single-step GTBLUP with UPG accounted in H⁻¹ (blue lines), ssMF—single-step GTBLUP with MF (black lines); pUPG-pedigree BLUP with UPG accounted in A⁻¹ (green lines), and pMF—pedigree BLUP with MF (red lines).

Reduction of heritability by additive variance scaling was suggested by Legarra et al. (2015) when MF are used for genomic prediction. Base populations in models with UPGs are assumed unrelated, which is contrary to MF. In order to solve that problem additive variance was suggested to be scaled by $(1 + t r (Γ) / (2 n) - 1^{'} Γ 1 / n^{2})$ , where tr(Γ) is the sum of diagonal elements of the Γ matrix (Legarra et al., 2015). However, this is based on assumption that the current population is a homogenous mixture of all the base populations that the MF will present. In reality base populations have influence unequally to the studied population, and thus we kept the same genetic variances in the UPG and MF models.

4 Conclusion

We presented a method to utilize the same number of MF as UPG in single-step GBLUP. The Covariance functions allowed smooth extrapolation of the $Γ$ matrix with 29 metafounders to 148 in the pedigree of all animals. Use of $Γ_{148}$ increased correlation between the elements of pedigree and genomic relationship matrices. The $Γ_{148}$ matrix was tested in the ssGTBLUP approach and compared with UPG based ssGTBLUP. Results showed a slight improvement in prediction reliability and overprediction in the MF model over the UPG model.

Data availability statement

The data analyzed in this study was obtained from Finncattle Foundation (Finland), Finnish Breeder Association (FABA, Finland), Swedish Cattle Farmers Association (Växa), Landbrug and Fødevarer F.m.b.A (L and F), Nordic Cattle Genetic Evaluation (NAV, Denmark), Viking Genetics (Denmark), and corresponding farmers. Requests to access these datasets should be directed to the Director of Nordic Cattle Genetic Evaluation, Gert P. Aamand, Z2FwQGxmLmRr.

Author contributions

AK executed the analysis and wrote the manuscript. EM did the study design. MK, GA, IS, and EM provided support during analysis execution, reviewed, and edited the manuscript. All authors contributed to the article and approved the submitted version.

Funding

Project “Genomic evaluations for Western Finncattle” financed by Luke (Finland) and Finncattle Foundation (Finland).

Acknowledgments

We acknowledge Finncattle Foundation (Finland), Finnish Breeder Association (Finland), Nordic Cattle Genetic Evaluation (Denmark), and Viking Genetics (Denmark). We kindly acknowledge two reviewers participated in production of the current manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Aguilar, I., Misztal, I., Johnson, D. L., Legarra, A., Tsuruta, S., and Lawlor, T. J. (2010). Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 93, 743–752. doi:10.3168/jds.2009-2730

PubMed Abstract | CrossRef Full Text | Google Scholar

Bradford, H. L., Masuda, Y., VanRaden, P. M., Legarra, A., and Misztal, I. (2019). Modeling missing pedigree in single-step genomic BLUP. J. Dairy Sci. 102, 2336–2346. doi:10.3168/jds.2018-15434

PubMed Abstract | CrossRef Full Text | Google Scholar

Christensen, O. F. (2012). Compatibility of pedigree-based and marker-based relationship matrices for single-step genetic evaluation. Genet. Sel. Evol. 44, 37. doi:10.1186/1297-9686-44-37

PubMed Abstract | CrossRef Full Text | Google Scholar

Christensen, O. F., and Lund, M. S. (2010). Genomic prediction when some animals are not genotyped. Genet. Sel. Evol. 42, 2. doi:10.1186/1297-9686-42-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Garcia-Baccino, C. A., Legarra, A., Christensen, O. F., Misztal, I., Pocrnic, I., Vitezica, Z. G., et al. (2017). Metafounders are related to Fst fixation indices and reduce bias in single-step genomic evaluations. Genet. Sel. Evol. 49, 34. doi:10.1186/s12711-017-0309-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Granado-Tajada, I., Legarra, A., and Ugarte, E. (2020). Exploring the inclusion of genomic information and metafounders in Latxa dairy sheep genetic evaluations. J. Dairy Sci. 103 (7), 6346–6353. doi:10.3168/jds.2019-18033

PubMed Abstract | CrossRef Full Text | Google Scholar

Kirkpatrick, M., Lofsvold, D., and Bulmer, M. (1990). Analysis of the inheritance, selection and evolution of growth trajectories. Genetics 124 (4), 979–993. doi:10.1093/genetics/124.4.979

PubMed Abstract | CrossRef Full Text | Google Scholar

Kluska, S., Masuda, Y., Ferraz, J. B. S., Tsuruta, S., Eler, J. P., Baldi, F., et al. (2021). Metafounders may reduce bias in composite cattle genomic predictions. Front. Genet. 12, 678587. doi:10.3389/fgene.2021.678587

PubMed Abstract | CrossRef Full Text | Google Scholar

Koivula, M., Strandén, I., Aamand, G. P., and Mäntysaari, E. A. (2022). Accounting for missing pedigree information with single-step random regression test-day models. Agriculture 12, 388. doi:10.3390/agriculture12030388

CrossRef Full Text | Google Scholar

Koivula, M., Strandén, I., Aamand, G. P., and Mäntysaari, E. A. (2021b). Meta-model for genomic relationships of metafoundersapplied on large scale single-step random regression test-day model. Interbull Bull. 56, 76–81.

Google Scholar

Koivula, M., Strandén, I., Aamand, G. P., and Mäntysaari, E. A. (2021a). Practical implementation of genetic groups in single-step genomic evaluations with Woodbury matrix identity-based genomic relationship inverse. J. Dairy Sci. 104 (9), 10049–10058. doi:10.3168/jds.2020-19821

PubMed Abstract | CrossRef Full Text | Google Scholar

Kudinov, A. A., Mäntysaari, E. A., Aamand, G. P., Uimari, P., and Strandén, I. (2020). Metafounder approach for single-step genomic evaluations of Red Dairy cattle. J. Dairy Sci. 103 (7), 6299–6310. doi:10.3168/jds.2019-17483

PubMed Abstract | CrossRef Full Text | Google Scholar

Legarra, A., Aguilar, I., and Misztal, I. (2009). A relationship matrix including full pedigree and genomic information. J. Dairy Sci. 92, 4656–4663. doi:10.3168/jds.2009-2061

PubMed Abstract | CrossRef Full Text | Google Scholar

Legarra, A., Christensen, O. F., Aguilar, I., and Misztal, I. (2014). Single Step, a general approach for genomic selection. Livestock Sci. 166, 54–65. doi:10.1016/j.livsci.2014.04.029

CrossRef Full Text | Google Scholar

Legarra, A., Christensen, O. F., Vitezica, Z. G., Aguilar, I., and Misztal, I. (2015). Ancestral relationships using metafounders: Finite ancestral populations and across population relationships. Genetics 200, 455–468. doi:10.1534/genetics.115.177014

PubMed Abstract | CrossRef Full Text | Google Scholar

Lidauer, M. H., Pösö, J., Pedersan, J., Lassen, J., Madsen, P., Mäntysaari, E. A., et al. (2015). Across-country test-day model evaluations for Holstein, nordic red cattle, and Jersey. J. Dairy Sci. 98, 1296–1309. doi:10.3168/jds.2014-8307

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, P., Lund, M. S., Nielsen, U. S., Aamand, G. P., and Su, G. (2015). Single-step genomic model improved reliability and reduced the bias of genomic predictions in Danish Jersey. J. Dairy Sci. 98, 9026–9034. doi:10.3168/jds.2015-9703

PubMed Abstract | CrossRef Full Text | Google Scholar

Macedo, F. L., Christensen, O. F., Astruc, J. M., Aguilar, I., Masuda, Y., and Legarra, A. (2020). Bias and accuracy of dairy sheep evaluations using BLUP and SSGBLUP with metafounders and unknown parent groups. Genet. Sel. Evol. 52 (47), 47. doi:10.1186/s12711-020-00567-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Mäntysaari, E. A., Liu, Z., and VanRaden, P. M. (2010). Interbull Bulletin 41, 17–22.

Google Scholar

Mäntysaari, E. A., Koivula, M., and Strandén, I. (2020). Symposium review: Single-step genomic evaluations in dairy cattle. J. Dairy Sci. 103 (6), 5314–5326. doi:10.3168/jds.2019-17754

PubMed Abstract | CrossRef Full Text | Google Scholar

Mäntysaari, E. A., Evans, R., and Strandén, I. (2017). Efficient single-step genomic evaluation for a multibreed beef cattle population having many genotyped animals. J. Anim. Sci. 95, 4728–4737. doi:10.2527/jas2017.1912

PubMed Abstract | CrossRef Full Text | Google Scholar

Masuda, Y., Tsuruta, S., Bermann, M., Bradford, H. L., and Misztal, I. (2021). Comparison of models for missing pedigree in single-step genomic prediction. J. Anim. Sci. 99 (2), skab019. doi:10.1093/jas/skab019

PubMed Abstract | CrossRef Full Text | Google Scholar

Masuda, Y., VanRaden, P. M., Shogo, T., Lourenco, D. A. L., and Misztal, I. (2022). Invited review: Unknown-parent groups and metafounders in single-step genomic BLUP. J. Dairy Sci. 105 (2), 923–939. doi:10.3168/jds.2021-20293

PubMed Abstract | CrossRef Full Text | Google Scholar

Matilainen, K., Strandén, I., Aamand, G. P., and Mäntysaari, E. A. (2018). Single step genomic evaluation for female fertility in Nordic Red dairy cattle. J. Anim. Breed. Genet. 135, 337–348. doi:10.1111/jbg.12353

PubMed Abstract | CrossRef Full Text | Google Scholar

McPeek, M. S., Xiaodong, W., and Ober, C. (2004). Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics 60, 359–367. doi:10.1111/j.0006-341X.2004.00180.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Misztal, I., Lourenco, D. A. L., and Legarra, A. L. (2020). Current status of genomic evaluation. J. Anim. Sci. 98, skaa101. doi:10.1093/jas/skaa101

PubMed Abstract | CrossRef Full Text | Google Scholar

Misztal, I., Vitezica, Z. G., Legarra, A., Aguilar, I., and Swan, A. A. (2013). Unknown-parent groups in single-step genomic evaluation. J. Anim. Breed. Genet. 130, 252–258. doi:10.1111/jbg.12025

PubMed Abstract | CrossRef Full Text | Google Scholar

Patry, C., and Ducrocq, V. (2011). Evidence of biases in genetic evaluations due to genomic preselection in dairy cattle. J. Dairy Sci. 94, 1011–1020. doi:10.3168/jds.2010-3804

PubMed Abstract | CrossRef Full Text | Google Scholar

Poulsen, B. G., Ostersen, T., Nielsen, B., and Christensen, O. F. (2022). Predictive performances of animal models using different multibreed relationship matrices in systems with rotational crossbreeding. Genet. Sel. Evol. 54 (1), 25–17. doi:10.1186/s12711-022-00714-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Přibyl, J., Madsen, P., Bauer, J., Přibylová, J., Šimečková, M., Vostrý, L., et al. (2013). Contribution of domestic production records, Interbull estimated breeding values, and single nucleotide polymorphism genetic markers to the single-step genomic evaluation of milk production. J. Dairy Sci. 96 (3), 1865–1873. doi:10.3168/jds.2012-6157

PubMed Abstract | CrossRef Full Text | Google Scholar

Quaas, R. L., and Pollak, E. J. (1981). Modified equations for sire models with groups. J. Dairy Sci. 64, 1868–1872. doi:10.3168/jds.S0022-0302(81)82778-6

CrossRef Full Text | Google Scholar

Silva, A. A., Silva, D. A., Silva, F. F., Costa, C. N., Lopes, P. S., Caetano, A. R., et al. (2019). Autoregressive single-step test-day model for genomic evaluations of Portuguese Holstein cattle. J. Dairy Sci. 102 (7), 6330–6339. doi:10.3168/jds.2018-15191

PubMed Abstract | CrossRef Full Text | Google Scholar

Strandén, I., and Lidauer, M. (1999). Solving large mixed linear models using preconditioned conjugate gradient iteration. J. Dairy Sci. 82, 2779–2787. doi:10.3168/jds.S0022-0302(99)75535-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Strandén, I., and Mäntysaari, E. A. (2020). Bpop: An efficient program for estimating base population allele frequencies in single and multiple group structured populations. AFSci. 29 (3), 166–176. doi:10.23986/afsci.90955

CrossRef Full Text | Google Scholar

Strandén, I., and Vuori, K. (2006). “RelaX2: Pedigree analysis program,” in Proceedings of the 8th World Congress on Genetics Applied to Livestock Production, 13-18 August 2006 (Belo Horizonte, MG, Brazil: Instituto Prociência), 27–30.

Google Scholar

Taskinen, M., Mäntysaari, E. A., Aamand, G. P., and Strandén, I. (2014). “Comparison of breeding values from single-step and bivariate blending methods,” in Proceedings of the 10th World Congress on Genetics Applied to Livestock Production, August 2014 (Vancouver, BC, Canada: WCGALP), 17–22.

Google Scholar

Tijani, A., Wiggans, G. R., Van Tassell, C. P., Philpot, J. C., and Gengler, N. (1999). Use of (co) variance functions to describe (co)variances for test day yield. J. Dairy Sci. 82 (1), 22610–22614. doi:10.3168/jds.S0022-0302(99)75228-8

CrossRef Full Text | Google Scholar

VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423. doi:10.3168/jds.2007-0980

PubMed Abstract | CrossRef Full Text | Google Scholar

Vitezica, Z. G., Aguilar, I., Misztal, I., and Legarra, A. (2011). Bias in genomic predictions for populations under selection. Genet. Res. 93, 357–366. doi:10.1017/S001667231100022X

PubMed Abstract | CrossRef Full Text | Google Scholar

Wiggans, G. R., Cole, J. B., Hubbard, S. M., and Sonstegard, T. S. (2017). Genomic selection in dairy cattle: The USDA experience. Annu. Rev. Anim. Biosci. 5 (1), 309–327. doi:10.1146/annurev-animal-021815-111422

PubMed Abstract | CrossRef Full Text | Google Scholar

Wiggans, G. R., VanRaden, P. M., and Cooper, T. A. (2011). The genomic evaluation system in the United States: Past, present, future. J. Dairy Sci. 94 (6), 3202–3211. doi:10.3168/jds.2010-3866

PubMed Abstract | CrossRef Full Text | Google Scholar

Xiang, T., Christensen, O. F., and Legarra, A. (2017). Technical note: Genomic evaluation for crossbred performance in a single-step approach with metafounders. J. Anim. Sci 95, 1472–1480. doi:10.2527/jas.2016.1155

PubMed Abstract | CrossRef Full Text | Google Scholar

Appendix 1: Representation of the Φ_pre matrix used in formula ${(Ф_{p r e}^{'} Ф_{p r e})}^{- 1} {Ф_{p r e}^{'} Γ_{p r e} Ф}_{p r e} {(Ф_{p r e}^{'} Ф_{p r e})}^{- 1}$ to compute the K matrix.

\begin{array}{c} B i r t h Y e a r \\ 1970 \\ 1980 \\ 1990 \\ 2000 \\ 2010 \\ 2021 \\ 1970 \\ 1980 \\ 1990 \\ 2000 \\ 2010 \\ 2021 \\ 1980 \\ 1990 \\ 2000 \\ 2010 \\ 2021 \\ 2000 \\ 2021 \\ 2000 \\ 2021 \\ 1990 \\ 2000 \\ 2021 \\ 2000 \\ 1960 \\ 1980 \\ 2000 \\ 2020 \end{array} [\begin{array}{c} \begin{array}{c} {S T Y}^{1} \\ - 0.437 \\ - 0.155 \\ 0.127 \\ 0.409 \\ 0.690 \\ 0.972 \\ - 0.437 \\ - 0.155 \\ 0.127 \\ 0.409 \\ 0.690 \\ 0.972 \\ - 0.155 \\ 0.127 \\ 0.409 \\ 0.690 \\ 0.972 \\ 0.409 \\ 0.972 \\ 0.409 \\ 0.972 \\ 0.127 \\ 0.409 \\ 0.972 \\ 0.409 \\ - 0.718 \\ - 0.155 \\ 0.409 \end{array} & \begin{array}{c} F I N \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} & \begin{array}{c} S W E \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} & \begin{array}{c} D N K \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} & \begin{array}{c} N O R \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} & \begin{array}{c} R D C \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} & \begin{array}{c} F I C \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} & \begin{array}{c} O T H E R \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0 \end{array} & \begin{array}{c} H O L \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 1 \end{array} \\ 0.972 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{array}]

¹ STY - birth year of the group standardized as $\frac{2 ({y e a r}_{M F} - {y e a r}_{\min})}{{y e a r}_{\max} - {y e a r}_{\min}} - 1$ , where year_MF is the year of the MF, year_min = 1950 and year_max = 2021.

Keywords: genetic groups, genomic evaluation, red dairy cattle, finncattle, co-variance function

Citation: Kudinov AA, Koivula M, Aamand GP, Strandén I and Mäntysaari EA (2022) Single-step genomic BLUP with many metafounders. Front. Genet. 13:1012205. doi: 10.3389/fgene.2022.1012205

Received: 05 August 2022; Accepted: 31 October 2022;
Published: 21 November 2022.

Edited by:

Martino Cassandro, University of Padua, Italy

Reviewed by:

Andres Legarra, INRAE Occitanie Toulouse, France
Ivan Pocrnić, University of Edinburgh, United Kingdom

Copyright © 2022 Kudinov, Koivula, Aamand, Strandén and Mäntysaari. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Andrei A. Kudinov, YW5kcmVpLmt1ZGlub3ZAbHVrZS5maQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Single-step genomic BLUP with many metafounders

1 Introduction

2 Materials and methods

2.1 Data

2.2 Statistical models

2.2.1 Single-step GTBLUP

2.2.2 Single-step GTBLUP with UPG

2.2.3 Single-step GTBLUP with MF

2.2.4 Pedigree BLUP

2.5 Estimation of the Γ matrix

2.6 Validation of model fit

2.7 Software

3 Results and discussion

3.1 Relationship matrices

3.2 Model runs and validation

4 Conclusion

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

References

Appendix 1: Representation of the Φ_pre matrix used in formula ${(Ф_{p r e}^{'} Ф_{p r e})}^{- 1} {Ф_{p r e}^{'} Γ_{p r e} Ф}_{p r e} {(Ф_{p r e}^{'} Ф_{p r e})}^{- 1}$ to compute the K matrix.

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good

Single-step genomic BLUP with many metafounders

1 Introduction

2 Materials and methods

2.1 Data

2.2 Statistical models

2.2.1 Single-step GTBLUP

2.2.2 Single-step GTBLUP with UPG

2.2.3 Single-step GTBLUP with MF

2.2.4 Pedigree BLUP

2.5 Estimation of the Γ matrix

2.6 Validation of model fit

2.7 Software

3 Results and discussion

3.1 Relationship matrices

3.2 Model runs and validation

4 Conclusion

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

References

Appendix 1: Representation of the Φpre matrix used in formula (Фpre′Фpre)−1Фpre′ΓpreФpre (Фpre′Фpre)−1 to compute the K matrix.

95% of researchers rate our articles as excellent or good

Appendix 1: Representation of the Φ_pre matrix used in formula ${(Ф_{p r e}^{'} Ф_{p r e})}^{- 1} {Ф_{p r e}^{'} Γ_{p r e} Ф}_{p r e} {(Ф_{p r e}^{'} Ф_{p r e})}^{- 1}$ to compute the K matrix.