Multivariate GBLUP Improves Accuracy of Genomic Selection for Yield and Fruit Weight in Biparental Populations of Vaccinium macrocarpon Ait

Covarrubias-Pazaran, Giovanny; Schlautman, Brandon; Diaz-Garcia, Luis; Grygleski, Edward; Polashock, James; Johnson-Cicalese, Jennifer; Vorsa, Nicholi; Iorizzo, Massimo; Zalapa, Juan

doi:10.3389/fpls.2018.01310

ORIGINAL RESEARCH article

Front. Plant Sci. , 12 September 2018

Sec. Plant Breeding

Volume 9 - 2018 | https://doi.org/10.3389/fpls.2018.01310

Multivariate GBLUP Improves Accuracy of Genomic Selection for Yield and Fruit Weight in Biparental Populations of Vaccinium macrocarpon Ait

$\r\nGiovanny Covarrubias-Pazaran*$ Giovanny Covarrubias-Pazaran¹^*

Brandon Schlautman²

Luis Diaz-Garcia^3,4

Edward Grygleski⁵

James Polashock⁶

Jennifer Johnson-Cicalese⁷

Nicholi Vorsa⁷

Massimo Iorizzo⁸

Juan Zalapa⁹^*

¹Bayer CropScience NV, Innovation Center, Ghent, Belgium
²The Land Institute, Salina, KS, United States
³Department of Horticulture, University of Wisconsin Madison, Madison, WI, United States
⁴Instituto Nacional de Investigaciones, Forestales, Agrícolas y Pecuarias, Campo Experimental Pabellón, Aguascalientes, Mexico
⁵Valley Corporation, Tomah, WI, United States
⁶Genetic Improvement of Fruits and Vegetables Laboratory, USDA-ARS, Chatsworth, NJ, United States
⁷Blueberry and Cranberry Research and Extension Center, Rutgers University, Chatsworth, NJ, United States
⁸Department of Horticulture Sciences, Plants for Human Health Institute, North Carolina State University, Kannapolis, NC, United States
⁹Vegetable Crops Research Unit, USDA-ARS, University of Wisconsin, Madison, WI, United States

The development of high-throughput genotyping has made genome-wide association (GWAS) and genomic selection (GS) applications possible for both model and non-model species. The exploitation of genome-assisted approaches could greatly benefit breeding efforts in American cranberry (Vaccinium macrocarpon) and other minor crops. Using biparental populations with different degrees of relatedness, we evaluated multiple GS methods for total yield (TY) and mean fruit weight (MFW). Specifically, we compared predictive ability (PA) differences between univariate and multivariate genomic best linear unbiased predictors (GBLUP and MGBLUP, respectively). We found that MGBLUP provided higher predictive ability (PA) than GBLUP, in scenarios with medium genetic correlation (8–17% increase with cor_g~0.6) and high genetic correlations (25–156% with cor_g~0.9), but found no increase when genetic correlation was low. In addition, we found that only a few hundred single nucleotide polymorphism (SNP) markers are needed to reach a plateau in PA for both traits in the biparental populations studied (in full linkage disequilibrium). We observed that higher resemblance among individuals in the training (TP) and validation (VP) populations provided greater PA. Although multivariate GS methods are available, genetic correlations and other factors need to be carefully considered when applying these methods for genetic improvement.

Introduction

A central goal of genetics is the identification of genotype-phenotype associations. Traditional quantitative trait loci (QTL) mapping and genome-wide association studies (GWAS) are the primary tools for achieving such a goal. Thousands of genetic variants associated with traits of agronomic importance in economically important crops have been identified in the last century (Ingvarsson and Street, 2011). However, unraveling the causal genes behind such QTLs has often not been accomplished due to the high costs involved. Fortunately, the identification of markers in linkage disequilibrium (LD) with agriculturally important causal variants has been enough to move the genomic information to breeding applications such as marker-assisted selection (MAS), marker-assisted backcrossing, and pyramiding of major disease resistance genes (Flint-Garcia et al., 2003; Holland, 2004; Jiang et al., 2004; Bertrand and Mackill, 2008). However, after decades of studies, the application and value of the QTL paradigm for plant improvement has been questioned due to its low success in deploying genetic markers for breeding quantitative traits (Bertrand and Mackill, 2008; Xu and Crouch, 2008).

Genomic selection (GS), introduced by Meuwissen et al. (2001), has become the next step in MAS methods and has been effectively used in plant and animal breeding programs for more than a decade (Hayes et al., 2009; Jannink et al., 2010). Currently, several species have adopted this methodology, and moderate to high prediction accuracies [based on cross-validation (CV)] have been reported in crops such as wheat (Triticum aestivum), oat (Avena sativa), maize (Zea mays), rice (Oryza sativa), rye (Secale cereale), and barley (Hordeum vulgare) (Asoro et al., 2011; Zhao et al., 2012; Lipka et al., 2014; Rutkoski et al., 2014; Wang et al., 2014; Sallam et al., 2015; Spindel et al., 2015). Fruit crops have adopted this technology slower, although major fruit crops such as apple (Malus × domestica) and kiwifruit (Actinidia deliciosa) have made great progress on the implementation of these technologies (Testolin, 2010; Kumar et al., 2012; Muranty et al., 2015). The slower adoption could be due to the availability of genomic resources, and concerns about the effectiveness of GS compared to classical methods, such as phenotypic recurrent selection, which have made important progress in fruit breeding for hundreds of years. Recently, next-generation sequencing (NGS) studies have reduced the gap between major and minor crops such as cranberry (Vaccinium macrocarpon Ait.; 2n = 2x) (Huang et al., 2009; Zalapa et al., 2012; Fajardo et al., 2014; Polashock et al., 2014; Schlautman et al., 2015; Covarrubias-Pazaran et al., 2016). Other fruit crops, including apple and kiwifruit, have used these methods to generate vast quantities of markers to propose and perform GS (Testolin, 2010; Kumar et al., 2012; Muranty et al., 2015). The efficiency of GS to select parents in shorter intervals (i.e., predictions early on the breeding pipeline) and the possibility to increase selection intensity compared to classical approaches (i.e., ability to predict untested individuals) holds great potential for fruit breeding (Riedelsheimer and Melchinger, 2013; Endelman et al., 2014).

Various factors including training population (TP) size, marker density, heritability, magnitude of the LD, trait architecture, resemblance between TP and the validation population (VP), and the interaction of these factors, appear to be the principal forces driving the prediction accuracies of GS (Lorenzana and Bernardo, 2009; Guo et al., 2012; Resende et al., 2012; Habier et al., 2013; Riedelsheimer et al., 2013; Lorenz and Smith, 2015; Muranty et al., 2015). In addition, a thorough characterization and modeling of environmental variances (Technow et al., 2015) and the covariance among multiple traits also appear to increase the accuracy of GS models.

One of the most recent ideas to increase the predictive ability of the GS models is the use of multivariate models. The use of multivariate mixed models in breeding was originally proposed in animal breeding to model the genetic correlation among traits, longitudinal data, and to model genotype by environment interactions (trajectory across multiple years or environments) in order to exploit the existent correlations in the data (Mrode, 2014; Lee and Van der Werf, 2016). The first application of mixed models for multi-trait evaluation was by Henderson and Quaas (1976). The gain in accuracy of multivariate models compared to univariate models depends largely on the difference between the genetic and residual correlations between the responses (Schaeffer, 1984; Thompson and Meyer, 1986). A positive impact of the multi-trait methodology is its capacity to increase the predictive ability on traits with low heritability when they are analyzed together with high heritability traits that are genetically correlated (Thompson and Meyer, 1986). Until the last decade, multivariate methods have been exploited in plant and animal breeding mainly in species with pedigree information available to model the relationships among individuals and traits in the mixed model framework (Mrode, 2014). With the advent of massive molecular marker datasets, genomic relationship matrices are replacing pedigree-based relationship matrices, opening new analysis options for crops with limited pedigree information (Endelman and Jannink, 2012).

Like other woody perennial species, cranberry genetic improvement has been limited by the long interval needed to produce a cultivar (Janick and Moore, 1975; Johnson-Cicalese et al., 2015). Furthermore, due to its recent domestication in the mid-1800s and late start of breeding efforts in the 1920s, advances in cranberry genetics have been even slower with respect to other major fruit crops such as apple and peach. Therefore, cranberry could serve as a model for how NGS coupled with molecular-assisted breeding strategies, such as GS, could accelerate cultivar development in non-model or partially domesticated crop species (Zalapa et al., 2012, 2015). Within the past 5 years, NGS technologies have been used to increase the availability of genomic resources in cranberry from almost none to now include: assembled organellar genomes (Fajardo et al., 2012, 2014), a draft nuclear genome and transcriptome (Polashock et al., 2014), multiple SSR based genetic maps (Georgi et al., 2013; Schlautman et al., 2015), and most recently high density genetic maps and a consensus map with thousands of SNP (Covarrubias-Pazaran et al., 2016; Schlautman et al., 2017) and the use of massive high throughput phenotyping techniques (Diaz-Garcia et al., 2018). Currently, cranberry breeding relies heavily in the evaluation of medium to large biparental populations with the main goal of improving commercially useful traits such as fruit color, shape, and brix degrees, as well as disease resistance and yield. Cranberry breeding requires a hefty initial economic investment for field evaluation due to the need of constructing flooding beds that mimic commercial growing conditions to allow water harvesting. Construction of a one acre cranberry bed to evaluate 500 genotypes will cost between $25,000 and $30,000 USD, not including maintenance and evaluation of the bed. Additionally, the release of a new cranberry varieties has required more 20 years on average. Thus, reducing the breeding cycle length by using genomic technologies and selective phenotyping to reduce the high cost of evaluating biparental populations are the main drivers of current research in cranberry breeding.

In this research, we used the genomic resources available in cranberry to test the usefulness of genomic selection and compare differences in PA for total yield (TY) and mean fruit weight (MFW) in cranberry. We used both univariate and multivariate genomic best linear unbiased predictor (GBLUP and MGBLUP, respectively) approaches together with traditional biparental populations commonly used in cranberry breeding. This research will allow us to understand the benefits of using genomic prediction using related individuals (i.e., full-sib and half-sib individuals) with the aim of reducing the population-sizes of families to be planted for field evaluation (which is the most expensive part of a cranberry breeding program) while also increasing the number of families evaluated in the field trials. Also, we investigated two scenarios: low or null genetic correlation scenario (in our data the correlation between TY and MFW) and high genetic correlation scenario (in our data the correlation among multiple years). These two scenarios will allow us to investigate the usefulness of MGBLUP to improve the PA in our current GS efforts.

Materials and Methods

Plant Material and Marker Information

We used three cranberry biparental populations denominated CNJ02 (Mullica Queen x Crimson Queen; MQ × CQ, N = 148), CNJ04 (MQ × Stevens, N = 67) and GRYG [BGBLNL × (GH1x35), N = 351]. The parents of the three crosses are highly heterozygous genotypes frequently used in cranberry breeding programs. MQ and CQ are hybrids obtained after three generations of selection from wild materials, BGBLNL and GH1x35 are second-generation hybrids and Stevens is a first-generation hybrid from two wild selected parents. The CNJ02 and CNJ04 populations are planted and maintained at the Rutgers University P.E. Marucci Center, Chatsworth, NJ. The GRYG population is planted and maintained at Valley Corporation, Tomah, Wisconsin. CNJ02 and CNJ04 are half-sibs, and are not closely related with GRYG. Each genotype was clonally propagated and planted in the field using multiple cuttings in a defined 0.46 m² (5 ft²) square plot to mimic commercial conditions.

Genotypic information was obtained using the GBS protocol from Elshire et al. (2011) with modifications described in Schlautman et al. (2017). EcoT22I, which cuts the site 5′-ATGCA↓T-3′/ /3′-T↑ACGTA-5′, was selected for reducing genome complexity in this study based on GBS optimization results in cranberry to ensure good coverage for sequence tags in all populations [more details can be found in Covarrubias-Pazaran et al. (2016) and Schlautman et al. (2017)]. Resulting libraries were sequenced on the Illumina HiSeq 2000 sequencing platform (Illumina, San Diego, California).

From the different number of SNPs available in each of the three biparental populations, a total number of 7389 SNP markers were polymorphic across the 12 linkage groups (LGs) in at least one of the three cranberry populations. Markers were positioned using the consensus genetic map (anchoring 6074 markers) obtained and described by Schlautman et al. (2017). Only biallelic loci with minor allele frequency (MAF) >0.05 were used in the analyses. According to the genetic maps published the SNPs cover the entire linkage groups and therefore causal and non-causal regions were assumed to have markers. Genotypic data is available in the Supplementary File 1.

Phenotype Collection

Repeated measures for total yield (TY) and mean fruit weight (MFW) were taken over a three-year period for 148 genotypes from the CNJ02 population (2011–2013) and 67 genotypes for CNJ04 (2012-2014); the GRYG population comprised 351 genotypes for which data was collected over a two-year period (2014-2015). TY was determined by harvesting and weighting all the fruit within a 0.09 m² (1 ft²) metallic square set in each cranberry plot [0.46 m² (5 ft²)] representing each genotype. Twenty five fruit for each genotype were randomly selected and weighted to calculate MFW as described in Georgi et al. (2013) and Johnson-Cicalese et al. (2015).

Experimental Design and Mixed Modeling

All populations were planted together with 15 check plots (3 plots per 5 parents) positioned spatially across the flooding beds (commercial-condition fields). Additionally, to deal with the lack of replication in our experimental design, a two-step approach was used for the GS exercise for each population. First, a heterogeneous-variance univariate mixed model including all years of data was used to fit a model of the form y = Xβ + Zu + ε, where y was the response variable (TY or MFW), X and Z were incidence matrices for fixed and random effects respectively, β was the vector of fixed effects associated to the environment (year-location combination), u was the vector of random effects associated to rows [r ~ (0, I $σ_{r}^{2}$ )], range or columns [c ~ (0, I $σ_{c}^{2}$ )], the 2-dimensional spline [d ~ (0, I $σ_{d}^{2}$ )], and genotypic effects [g~ (0, I $σ_{g}^{2}$ )] (no marker information used at this point), and ε was the error associated to the model ε ~ (0, I $σ_{e}^{2}$ ). The heterogeneous variance model was used to allow a different variance component for genotype effects in each environment as for the other random effects. This was achieved by using the diag() covariance structure functionality in the mmer2() function available in sommer, i.e., diag(ENV):genotype fits for a random effect for genotypes with a variance var(u_g) = G_e ⊗ A, where G is the variance covariance for genotypes among environments and A is a relationship matrix among genotypes:

\begin{array}{l} var (u_{g}) = G_{e} \otimes A = [\begin{matrix} σ_{e 1}^{2} & \dots & σ_{e 1 e i} \\ \dots & ⋱ & \dots \\ σ_{e i e 1} & \dots & σ_{e i e i}^{2} \end{matrix}] \otimes A \end{array}

where A was typically a variance covariance matrix among the levels of the random effect (i.e., genotypes evaluated, levels of blocks, etc.) and for this model was a diagonal matrix with as many ones as genotypes evaluated (the genomic relationship matrix was not used at this point) and σ_eiej was the covariance among the same genotypes in different environment and here was considered zero for the diagonal model. The result was that different variance components can be estimated for each random effect in each environment, and by-environment genotype predictions can be obtained. Because the mapping populations were full-sib families with replication of alleles across genotypes in a uniformly managed cranberry bed, we made a spatial relationship assumption stating that large rows and columns of genotypes should resemble one another allowing to fit row and column effects (Schlautman et al., 2015). In addition, we fitted the two-dimensional splines to account for spatial trends that reflect shapes proper of tensor products (Velazco et al., 2017). Residuals were investigated using variograms to verify the proper fit. All spatial mixed models (two-dimensional splines) were fitted using the R package sommer (Covarrubias-Pazaran, 2016). Variance components were tested to be different from zero using likelihood ratio tests. Description of the phenotypic data, variance components and heritabilities for this first step modeling can be found in the Additional File 1.

From these models we obtained two types of predictions for the genotype effect, one across environments and another for each environment. The idea was to use the by-environment genotype prediction to fit a multivariate model using each environment genotype predictions as a response from the same trait (i.e., [y_MFW−2011, y_MFW−2012]) to mimic a natural high genetic correlation scenario, whereas the across-environment predictions for both traits were used to build the multivariate response that in our data mimics a low genetic correlation scenario given the low genetic correlation found among these traits (i.e., [y_MFW, y_TY]).

Data Filtering

In our experience, the use of data from environments with null or very small genomic-heritability values (i.e., h $_{g}^{2}$ < 0.10) in multivariate models tends to bring computational issues or non-sense genetic correlation values. Therefore, we decided to calculate genomic heritabilities for each environment using the by-environment genotype prediction as response and a single random effect for genotypes using the genomic relationship matrix. In summary a model of the form y = Xβ + Zu + ε, where y is the response variable (by-environment genotype prediction for TY or MFW), X and Z are incidence matrices for fixed and random effects respectively, β is the vector of fixed effects associated to the intercept only, u is the vector of random effects associated to genotypes [g ~ (0, A $σ_{r}^{2}$ )], where A is the additive genomic relationship matrix [A_g = MM'/2 Σ p_i(1-p_i)] (VanRaden, 2008). Genomic heritabilities instead of generalized forms of heritability where calculated given the greater ability of genomic heritability to provide insight on the PA of the data (Cullis et al., 2006; de los Campos et al., 2015). For each trait-year combination the genomic heritability was calculated using the formula h $_{g}^{2}$ = $σ_{g}^{2}$ / ( $σ_{g}^{2}$ + $σ_{e}^{2}$ ), where $σ_{g}^{2}$ is the genetic variance using marker-based relationship and $σ_{e}^{2}$ is the residual variance. Standard errors for the heritabilities were computed using the delta method implemented in the pin function of the R package sommer (Covarrubias-Pazaran, 2016). Environments (year-location combination) with h $_{g}^{2}$ lower than 0.10 or with SE that approximated the h $_{g}^{2}$ to zero were discarded from all posterior analyses.

Genetic Correlation Across Years

Multivariate mixed models were used to assess the genetic correlation across years within populations. Following (Maier et al., 2015), the multivariate mixed model implemented has the form:

\begin{matrix} y_{1} = X_{1} β_{1} + Z_{1} u_{1} + e_{1} \\ y_{2} = X_{2} β_{2} + Z_{2} u_{2} + e_{2} \\ ⋮ \\ y_{t} = X_{t} β_{t} + Z_{t} u_{t} + e_{t} \end{matrix}

where y_i is a vector of trait phenotypes, β_i is a vector of fixed effects, u_i is a vector of random effects for individuals and e_i are residuals for trait “I” (i = 1, …, t). The random effects (u₁ … u_i and e_i) are assumed to be normally distributed with mean zero. X and Z are incidence matrices for fixed and random effects respectively. The distribution of the multivariate response and the phenotypic variance covariance (V) are:

\begin{array}{l} Y= X β + Z u+ ε where   Y ~ MVN (X β, V) \\ y = [\begin{matrix} y_{1} \\ ⋮ \\ y_{t} \end{matrix}] X = [\begin{matrix} X_{1} \dots 0 \\ \dots ⋱ \dots \\ 0 \dots X_{t} \end{matrix}] \\ V = [\begin{matrix} Z_{1} K σ_{u 1_{t 1}}^{2} Z_{1}^{'} + Z_{1} R σ_{e_{t 1}}^{2} Z_{1}^{'} & \dots & Z_{1} K σ_{u 1_{1, t}} Z_{t}^{'} + Z_{t 1} R σ_{e_{1, t}} Z_{t i}^{'} \\ ⋮ & ⋱ & ⋮ \\ Z_{1} K σ_{u 1_{1, t}} Z_{t}^{'} + Z_{1} R σ_{e_{1, t}} Z_{t}^{'} & \dots & Z_{t} K σ_{u 1_{t}}^{2} Z_{t i}^{'} + Z_{t} R σ_{e_{t}}^{2} Z_{t}^{'} \end{matrix}] \end{array}

where K is the relationship or covariance matrix for the kth random effect (u = 1,…,k), and R = I is an identity matrix for the residual term. The terms, $σ_{u k_{i}}^{2}$ and $σ_{e_{i}}^{2}$ denote the genetic (or any of the kth random terms) and residual variance of trait “i,” respectively and σ_{u_k_ij}and σ_{e_ij} the genetic (or any of the kth random terms) and residual covariance between traits “i” and “j” (i = 1,…,t, and j = 1,…,t). For more details about the multivariate algorithm used in sommer please look at Covarrubias-Pazaran (2018). The genetic correlation among years was calculated using the by-environment genotype predictions as the multivariate response.

Model Comparison

By-environment and across genotype predictions were used for validating univariate and multivariate GS in each population independently. The following methods were compared: (1) genomic best linear unbiased predictor (GBLUP), which used the information from all markers coded in the additive relationship matrix, (2) GBLUP-AD, which included the additive and dominance relationships, (3) GBLUP-ADE, which included the additive, dominance, and epistatic relationships, and (4) Multivariate GBLUP, which exploits the covariance information among traits (or environments) at the level of genotypes and residuals. These models were fitted using the sommer package (Covarrubias-Pazaran, 2016).

The first comparison among all models was made environment by environment and trait by trait (i.e., comparison among models for MFW in environment Y2011, Y2012, etc.) for each population using the by-environment genotype predictions as response variable. The MGBLUP for this first comparison used as the multivariate response the same trait-environment response than the univariate models plus data of an additional environment (high genetic correlation in our data). A second comparison among models was made using across-environment genotype predictions for each trait. The MGBLUP for this second comparison used as the multivariate response the across-environment genotype predictions for both traits (low genetic correlation scenario in our data).

The models were fitted using all markers by creating the additive genomic relationship matrix A_g for prediction in a kinship-based model [A_g = MM'/2 Σ p_i(1-p_i)] (VanRaden, 2008), dominance relationship matrix D_g [D_g = NN'/Σ 2p_iq_i(1- p_iq_i)] (Su et al., 2012) and additive by additive epistatic relationship matrix E_g (E_g = A#A; where # is the Hadamard product) (Su et al., 2012), where M is the marker matrix coded as −1, 0, 1 for the number of reference alleles for a given biallelic marker for the A matrix computation and 0, 1 (0 for homozygotes and 1 for heterozygotes genotypes) for the D matrix computation. The model used has the typical mixed model form; y = Xβ + Zu + ε, where y is the response variable, X and Z are incidence matrices for fixed and random effects, respectively, β is the vector of fixed effects (intercept only), u is the vector of random effects associated to the genotypic effects with the corresponding relationship matrices. For the multivariate GBLUP model only the additive relationship matrix was used, and the model and distributions follow Covarrubias-Pazaran (2018). In total, 100 iterations of 5-fold CV were used to test the PA under the different models. Tables and figures comparing the different models were built using the R Core Team (2017).

Effect of Marker Density in Prediction

To examine the influence of the number of markers in the PA, we fitted the univariate GBLUP model constructing the genomic relationship matrix (A_g) with different number of markers equally spaced and covering the entire genome across the 12 LGs in cranberry (Lorenzana and Bernardo, 2009). The consensus map developed by Schlautman et al. (2017) was used to ensure a homogeneous marker distribution. Then, we divided the entire linkage distance (~1,250 cM) in different number of bins; 20, 50, 100, 250, 500, 750, 1,000 and bins to reach the following marker densities; 1 marker every 60, 24, 12, 4.8, 4.4, 1.6, and 1.2 cM. For example, in the first case we built the A matrix with 20 markers, one marker every 60 cM, and in the densest case with 1,000 markers, picking one marker at about every 1.2 cM. The PA was deduced for both TY and MFW by averaging the results from 100 iterations of 5-fold CV for both traits where the 5-fold strategy consisted in dividing the population in 5 groups and using 1 group as VP and the rest as TP (100 rounds of this strategy yields 500 data points). Results were recorded and plotted using R (R Core Team, 2015). This analysis was performed using across-environment genotype predictions for both traits.

Effect of Training Population Relationship in Prediction

Following Lorenz and Smith (2015) the effect of resemblance between the TP and VP on the PA was examined in the three biparental populations. The three populations were chosen based on their degree of relationship. CNJ02 (MQ × CQ) were half-sibs with CNJ04 (MQ × Stevens). The GRYG population (BGBLNL95 × [GH1x35]) had little relationship with CNJ02 and CNJ04. Using the across-environment genotype predictions we fixed each population as the VP and the resemblance of the TP was varied using individuals with no relationship to the VP, related half-sib individuals (when available), and related full-sib individuals (within population). In total, 100 iterations of 5-fold CV were used to test the PA under the different scenarios.

Data Availability

Supplementary File 1 (SF1) contains the phenotypic and genotypic data. The R script for the analysis can be found in the Supplementary Files 2–5.

Results

Genomic Heritabilities

After the initial spatial modeling, we used the by-environment genotype predictions to calculate the genomic heritability for each environment and trait combination. We found higher genomic heritabilities for MFW compared to TY. For example for GRYG's population, we found a genomic heritability of 0.22 for TY in 2014 whereas the same year gave a genomic heritability for MFW of 0.43 (Table 1). The same trend was found in the three populations across most years. Some years resulted in a very low genomic heritability (< 0.10 and close to zero using the SE of the h $_{g}^{2}$ ). Such years of data were removed from posterior analysis due to our experience that using genotype predictions with null or close to zero genomic heritability provides spurious predictions or non-sense estimates of genetic correlation when used in the multivariate framework. The heritability was higher in GRYG than in CNJ02, and the smallest in CNJ04. Removing the year-trait combinations with low heritability for posterior analysis resulted in 2 years of data for GRYG and CNJ02, and 1 year of data for CNJ04 for both traits TY and MFW.

TABLE 1

Table 1. Year-base genomic heritabilities (h²g estimate) and their standard error (h²g SE) for three biparental populations (CNJ02, N = 148; CNJ04, N = 67; GRYG, N = 351) for traits total yield (TY) and mean fruit weight (MFW).

Genetic Correlations

Given that repeated measures of TY and MFW were taken for the three biparental populations across different years (environments) in the 2011–2015 interval, genetic correlations between years within traits, and genetic correlation between traits were obtained using multivariate mixed models (Table 2). We found a high genetic correlation between years for the trait MFW in both GRYG and CNJ02 populations (i.e., 0.93), which indicates a good consistency of breeding values (BV) across years (Table 2). Additionally, the genetic correlations between years for TY for both populations were smaller compared to MFW, but still relatively high (i.e., 0.62–0.90; Table 2). On the other hand, the genetic correlation between TY and MFW using across-environment genotype predictions were close to zero. The standard error of the genetic correlations indicates that for GRYG and CNJ02 the genetic correlations are not different than zero, whereas for CNJ04 the genetic correlation was different than zero but with a very high SE due to the population size (N = 67).

TABLE 2

Table 2. Genetic correlation between years within traits, among traits (rg estimate), and their standard errors (h²g SE) in three biparental populations (CNJ02, N = 148; CNJ04, N = 67; GRYG, N = 351).