Genomic prediction based on a joint reference population for the Xinjiang Brown cattle

¹College of Animal Science, Xinjiang Agricultural University, Urumqi, China
²Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
³Shijiazhuang Molbreeding Biotechnology Co., Ltd., Shijiazhuang, China

Introduction: Xinjiang Brown cattle constitute the largest breed of cattle in Xinjiang. Therefore, it is crucial to establish a genomic evaluation system, especially for those with low levels of breed improvement.

Methods: This study aimed to establish a cross breed joint reference population by analyzing the genetic structure of 485 Xinjiang Brown cattle and 2,633 Chinese Holstein cattle (Illumina GeneSeek GGP bovine 150 K chip). The Bayes method single-step genome-wide best linear unbiased prediction was used to conduct a genomic evaluation of the joint reference population for the milk traits of Xinjiang Brown cattle. The reference population of Chinese Holstein cattle was randomly divided into groups to construct the joint reference population. By comparing the prediction accuracy, estimation bias, and inflation coefficient of the validation population, the optimal number of joint reference populations was determined.

Results and Discussion: The results indicated a distinct genetic structure difference between the two breeds of adult cows, and both breeds should be considered when constructing multi-breed joint reference and validation populations. The reliability range of genome prediction of milk traits in the joint reference population was 0.142–0.465. Initially, it was determined that the inclusion of 600 and 900 Chinese Holstein cattle in the joint reference population positively impacted the genomic prediction of Xinjiang Brown cattle to certain extent. It was feasible to incorporate the Chinese Holstein into Xinjiang Brown cattle population to form a joint reference population for multi-breed genomic evaluation. However, for different Xinjiang Brown cattle populations, a fixed number of Chinese Holstein cattle cannot be directly added during multi-breed genomic selection. Pre-evaluation analysis based on the genetic structure, kinship, and other factors of the current population is required to ensure the authenticity and reliability of genomic predictions and improve estimation accuracy.

1 Introduction

Xinjiang Brown cattle is a major breed supporting the development of the cattle industry in Xinjiang, it was the first breed of cattle used for milk and meat purposes after the founding of the People’s Republic of China. In 2023, the number of Xinjiang Brown cattle in stock reached 1.16 million; however, the level of breed improvement was low, with a performance measurement population of <10,000. Therefore, it is important to establish an efficient genomic evaluation system for Xinjiang Brown cattle to improve their genetic level. The application of genome selection technology has significantly enhanced the efficiency of genomic evaluation (Hayes et al., 2016). Because of the implementation of the genome selection for Chinese Holstein cattle in 2008, early and accurate selection of calves and young cattle has been achieved (George et al., 2017), leading to higher accuracy in genomic evaluation and more precise assessment of individual breeding value (Weigel et al., 2010; Dassonneville et al., 2011). In addition, due to early selection and higher accuracy, the rate of genetic progress has doubled (Weller et al., 2017), improving breeding profitability and significantly reducing breeding costs. Although genome selection has been successfully applied to Chinese Holstein cattle population, the low level of production performance measurement and small population size of Xinjiang Brown cattle have hindered the application of genome selection technology. To improve the reliability of genomic predictions, especially for smaller populations, many feasible methods have been proposed, including increasing marker density, constructing linkage disquilibrium (LD) with more markers and causal mutations, and simulation data analysis (de Roos et al., 2009). Simulation and real data analyses (BrØndum et al., 2015) have shown that genomic prediction can play an important role in different populations.

For genome selection, it is necessary to have a reference population with sufficient size and an appropriate genetic structure that simultaneously incorporates genomic and phenotypic information to accurately predict genome estimated breeding values (GEBVs) (Metta et al., 2004; Boichard et al., 2016). Genome selection has recently been widely used in dairy cattle breeding programs. However, its application is limited to populations with a small number of breeds. Establishing a sufficiently large reference population is the most limiting factor for the accurate estimation of SNP effects (Boichard et al., 2016). When conducting genome selection for small populations, the most direct approach to enhancing its reliability is to expand the reference population. Many countries have found effective solutions through international cooperation, leading to joint genomic evaluations (Lund et al., 2011). By connecting France, Germany, Austria, Italy, Slovenia, Switzerland, and the United States of America to the InterGenomics consortium operated by the Interbull Center (Zumbach et al., 2010; Jorjani et al., 2012), genome-wide joint evaluations have been conducted for Brown Swiss bulls and Simmental cattle in Germany and Austria (Edel et al., 2011). Research has shown that by combining different populations of the same breed or related breeds in the reference population, more effective information can be obtained for estimating marker effects. Therefore, more accurate breeding predictions can be obtained from genomic predictions. Accuracy is improved when three related dairy cattle populations, Danish Red, Swedish Red, and Finnish Ayrshire, are combined into a single reference population (Zhou et al., 2014). When four European Holstein populations were combined into a reference population, the reliability increased by 10% (Lund et al., 2011). By combining six Brown Swiss populations, the reliability increased from 6% to 45% (Jorjani et al., 2012). However, multi-breed genomic evaluation of Xinjiang Brown cattle has not yet been conducted, limiting the optimized utilization of genomic selection technology in their genomic evaluation.

Based on the research foundation for domestic and international multi-breed joint genomic evaluation (Pryce et al., 2011; Lund et al., 2014; Steyn et al., 2019; Xu et al., 2019; Palombo et al., 2021), we proposed to integrate Xinjiang Brown cattle and Chinese Holstein cattle to construct a joint reference population for genome selection. In order to expand the Xinjiang brown cattle genome selection reference group, so as to apply multi-breed genome selection in Xinjiang brown cattle population to improve the prediction reliability. This study aimed to analyze the genetic structures of Xinjiang Brown and Chinese Holstein cattle to establish a multi-breed joint reference population. Using a dual-trait single-step genome-wide best linear unbiased prediction (ssGBLUP) approach, we established a genomic evaluation system for the primary lactation traits of Xinjiang Brown cattle, leveraging the joint reference population of Xinjiang Brown and Chinese Holstein cattle. This improves the accuracy of genomic selection for Xinjiang Brown cattle, creating a core breeding herd of genetically superior dairy Xinjiang Brown cows. Consequently, the genetic improvement of Xinjiang Brown cattle population will be expedited, leading to enhanced genetic levels across the breed.

2 Materials and methods

2.1 Sample collection and DNA extraction

A total of 1,729 blood samples were collected from the tail vein of Xinjiang Brown cattle and added to 10 mL EDTA anticoagulant tubes. The samples were then aliquoted into 1.5 mL centrifuge tubes and stored at −20°C. In addition, 66 frozen semen samples were collected from Xinjiang Brown and Brown Swiss bulls used for the artificial insemination of Xinjiang Brown cattle after 1983.

DNA was extracted from the above samples, and the concentration and purity of the obtained genomic DNA were measured using a NanoDrop 2000c spectrophotometer. The OD260/OD280 ratio was 1.7–1.9, indicating good DNA quality. After assessing DNA concentration, purity, and integrity, the samples were stored at −20°C (Ma, 2015).

2.2 Sample screening and chip detection

Phenotypically complete Xinjiang Brown cattle were screened from various Xinjiang Brown cattle farms for chip detection. After screening, 403 cows and 82 bulls from four core farms in Xinjiang region were selected. Moreover, we included 174 Xinjiang Brown cows from Xinjiang Uygur Autonomous Region State-owned Urumqi breeding farm, 50 Xinjiang Brown cows from Xinjiang Tianshan Animal Husbandry Bioengineering Co., Ltd. breeding farm, 130 Xinjiang Brown cows from the Tacheng Agriculture and Animal Husbandry Technology Co., Ltd., 49 Xinjiang Brown cows from Yili New Brown breeding farm, 71 bulls and 11 Brown Swiss bulls from Xinjiang Tianshan Animal Husbandry Bioengineering Co., Ltd. Bull breeding station. Chip data for Chinese Holstein cows were obtained from 2,633 animals in Beijing, distributed across 18 farms in the region. All of these animals were detected using the Illumina GeneSeek GGP bovine 150 K chip.

2.3 Chip imputation and quality control

A total of 139,376 and 138,892 SNP markers were detected using Xinjiang Brown and Chinese Holstein cattle chip assays, respectively. These data were imputed using Beagle 4.1 software, which infers haplotypes present in the population based on the principle of linkage disequilibrium. To ensure the accuracy of imputation, quality control measures were applied to the chip data.

The quality control criteria were as follows: individuals with a genotyping call rate of <90% were excluded. Only SNPs on chromosomes 1–30 were retained, with an individual genotype missing rate of <10%. SNPs with a minor allele frequency of >0.01 and a Hardy–Weinberg equilibrium p-value >1 × 10⁻⁶ were also included. After quality control using the PLINK software, the SNP genotypes were converted to a 0, 1, and 2 format. Finally, 118,622 and 123,268 SNP markers on the autosomal chromosomes of Xinjiang Brown and Chinese Holstein cattle were retained, respectively. Because the number of SNP markers differed between the two breeds after quality control, an intersection of the SNP markers was taken, which resulted in 118,021 common SNP markers for both breeds (Figure 1).

Figure 1

Figure 1. Venn diagram of GeneSeek GGP Bovine 150 k after quality control in Xinjiang Brown cattle and Chinese Holstein cattle.

2.4 Genetic structure analysis

2.4.1 Linkage disequilibrium analysis

The.map and. ped files for both breeds were converted to. vcf format using PLINK. The PopLDdecay software was then used to analyze and plot LD decay graphs (https://github.com/BGI-shenzhen/PopLDdecay) (Zhang et al., 2019). The LD metric, r², was calculated for the four populations (Hill, 1974). The mean r² value was computed at various marker distances of 1 Kb to demonstrate the degree of LD decay across different populations.

2.4.2 Population structure analysis

To infer ancestral populations based on the allele frequencies of descendant individuals, an unsupervised algorithm was employed (Consortium et al., 2009). In this study, genome-wide SNP data were used to calculate the population structure for ancestral admixture components with K values of 2–4 using admixture (Alexander et al., 2009). Visualization of the population structure was performed using the R package “pophelper”.

Analysis was conducted using the FastTree software (http://www.microbesonline.org/fasttree/), with the maximum likelihood method adopted for estimation. The Jukes–Cantor + CAT model was used as the default model for nucleotide phylogeny. The credibility of the phylogenetic tree branches was tested using 1,000 bootstrap replicates. Finally, the FigTree software was used for visualization.

2.4.3 Principal component analysis

The Gmatrix package in R was used to calculate the genomic kinship matrix (G-matrix) for Xinjiang Brown and Chinese Holstein cattle. Subsequently, principal component analysis (PCA) was performed using the G-matrix. The first three eigenvectors (PCA1, PCA2, and PCA3) were extracted and used as the horizontal and vertical coordinates for plotting. The contribution rates of the principal components were calculated on the basis of the percentage of eigenvalues. Finally, visualization was performed using the R language.

2.5 Multi-breed genomic evaluation using a joint reference population

2.5.1 Phenotypic data processing

The data for Xinjiang Brown cattle include production performance measurement records from 1983 to 2018 and DHI measurement records from 2010 to 2017. The data for Chinese Holstein cattle include DHI measurement records from 2001 to 2019. Milk-related traits, including 305-day milk yield (305dMY), milk fat yield (MFY), milk protein yield (MPY), and somatic cell score (SCS), were obtained through collation (Table 1). There were 7,516 and 93,717 milk trait measurements recorded for Xinjiang Brown and Chinese Holstein cattle, respectively.

Table 1

Table 1. The standards for data filtering.

The pedigree file used to analyze Xinjiang Brown cattle had 16,795 cattle, including 676 breeding bulls. Among these bulls, one had a maximum of 619 offspring, whereas 221 had only one offspring. Among the female adult cattle, 583 had only one offspring, whereas 1,623 had two or more offspring, with a maximum of 12 offspring per individual.

For the Chinese Holstein cattle, the pedigree file used for the analysis contained 6,54,390 individuals, including 11,243 breeding bulls. Among these bulls, one had a maximum of 7,884 offspring, whereas 4,695 had only one offspring. Among the female adult cattle, 1,63,781 had only one offspring, whereas 1,11,912 had two or more offspring, with a maximum of 12 offspring per individual (Table 2).

Table 2

Table 2. Data statistics.

2.5.2 Genotype data

Genotype data for 403 female Xinjiang Brown cattle, 71 male Xinjiang Brown cattle, and 11 male Brown Swiss cattle was considered. In addition, 2,100 Chinese Holstein cattle were randomly selected (According to PCA and Admixture results, PLINK software was used to remove the chip data of Chinese Holstein cows that was inconsistent with the large population of Chinese Holstein cows).

2.5.3 Statistical analysis

In this study, the ssGBLUP method was used to construct the H-matrix based on the pedigree and genomic information from Xinjiang Brown and Chinese Holstein cattle. The two-trait model Bayesian approach was used to estimate the variance components and breeding values for each trait.

To investigate the suitable integral ratio of the Chinese Holstein cattle in the joint reference population, a random gradient grouping approach was applied. The population was gradually accumulated in increments of 300 individuals to construct the joint reference population. A control group was established by excluding the phenotypic and genomic information of the Chinese Holstein cattle (Table 3).

Table 3

Table 3. Gradient grouping of joint reference group.

Because of the significant differences in milk production traits between Xinjiang Brown and Chinese Holstein cattle (Zhang et al., 2021), a dual-trait animal model was constructed. In this model, each biological trait was treated individually in the two populations, accounting for potential scale inconsistencies that may arise during breeding value estimation due to standardization across different breeds. The milk production traits (305dMY, MFY, MPY, and SCS) of Xinjiang Brown and Chinese Holstein cattle were considered to be two separate traits. A dual-trait linear model was used to estimate the variance components for milk production traits based on the genomic-pedigree combined relationship matrix, H-matrix. The model is described as follows:

$[\begin{array}{c} y_{1} \\ y_{2} \end{array}] = [\begin{array}{c} X_{1} & 0 \\ 0 & X_{2} \end{array}] [\begin{array}{c} β_{1} \\ β_{2} \end{array}] + [\begin{array}{c} Z_{1} & 0 \\ 0 & Z_{2} \end{array}] [\begin{array}{c} a_{1} \\ a_{2} \end{array}] + [\begin{array}{c} e_{1} \\ e_{2} \end{array}]$

In the formula, $y_{1}$ represents the observation value vector of a certain milk trait for each of Xinjiang Brown cattle, and $y_{2}$ represents Chinese Holstein cattle. $β_{1}$ and $β_{2}$ represent the fixed effect vectors for the same milk trait of Xinjiang Brown and Chinese Holstein cattle, respectively, including farm effect, calving year effect, calving season effect, and parity effect. The farm effect was divided into 20 levels based on the phenotypic data sources of the farms where the cattle were raised. The calving years of Xinjiang Brown cattle were divided into seven levels based on phenotypic records: 1985–1995, 1996–2000, 2001–2005, 2006–2008, 2009–2011, 2012–2014, and 2015–2018. The calving years of the Chinese Holstein cattle were divided into six levels: 2001–2005, 2006–2008, 2009–2011, 2012–2014, 2015–2018, and 2019. For the parity effect, Xinjiang Brown and Chinese Holstein cattle were classified into six levels: 1, 2, 3, 4, 5, and 6 (including those with more than six calves). The calving season effect of Xinjiang Brown cattle was divided based on the unique climatic conditions of Xinjiang. According to the method of temperature intervals, April and May were considered spring, June, July, and August were considered summer, September was considered autumn, and January, February, March, October, November, and December were considered winter. Conversely, the calving season effect of Chinese Holstein cattle was determined based on the climatic characteristics of Beijing. Following the same method of temperature intervals, March, April, and May were considered spring, June, July, and August were considered summer, September, October, and November were considered autumn, and December, January, and February were considered winter. $a_{1}$ and $a_{2}$ represent the individual additive genetic effect vectors for a certain milk trait of Xinjiang Brown and Chinese Holstein cattle, respectively. $e_{1}$ and $e_{2}$ represent the random residual effect vectors for the same milk trait of Xinjiang Brown and Chinese Holstein cattle, respectively. $X_{i}$ and $Z_{i}$ represent the incidence matrices for the fixed effects and individual random additive genetic effects of the i-th trait, respectively.

Assume $[\begin{array}{c} α_{1} \\ α_{2} \end{array}] \sim N (0, H \otimes [\begin{array}{c} σ_{α_{1}}^{2} & σ_{α_{1} α_{2}} \\ σ_{α_{2} α_{1}} & σ_{α_{2}}^{2} \end{array}])$ , $[\begin{array}{c} e_{1} \\ e_{2} \end{array}] \sim N (0, I \otimes [\begin{array}{c} σ_{e_{1}}^{2} & 0 \\ 0 & σ_{e_{2}}^{2} \end{array}])$ , In the aforementioned formula, H represents the combined genomic-pedigree relationship matrix. $σ_{α_{i}}^{2}$ denotes the additive genetic variance for the i-th breed, while $σ_{α_{1} α_{2}}$ represents the covariance between breeds. $σ_{e_{i}}^{2}$ stands for the residual variance of the i-th breed. Given that Xinjiang Brown and Chinese Holstein cattle are reared in separate populations, there is no residual covariance between the two groups.

The genetic variance–covariance structure of the ssGBLUP additive genetic effect model is represented by $a \sim N (0, H σ_{a}^{2})$ , where $σ_{a}^{2}$ denotes the additive genetic variance. H, the pedigree–genome relationship matrix, represents a combination of the pedigree-based additive genetic relationship matrix (A matrix) and the genome-based kinship matrix (G-matrix) (Aguilar et al., 2010; Christensen and Lund, 2010).

The formula used to compute H is as follows:

$H = [\begin{array}{c} A_{11} - A_{12} A_{22}^{- 1} A_{21} + A_{12} A_{22}^{- 1} {G A}_{21} & A_{12} A_{22}^{- 1} G \\ G A_{22}^{- 1} A_{21} & G \end{array}]$

Subscripts 1 and 2 in A represent the non-genotyped and genotyped animals in the population, respectively. G represents the genetic relationship matrix. The calculation formula is as follows: $G = \frac{M M^{'}}{2 \sum_{k = 1}^{m} p_{k} (1 - p_{k})}$ . M represents the association matrix for SNP effects, where the elements $(0 - 2 p_{j})$ , $(1 - 2 p_{j})$ , and $(2 - 2 p_{j})$ represent homozygous 11, heterozygous 12 or 21, and homozygous 22 genotypes, respectively. $p_{j}$ represents the minor allele frequency of the jth SNP, and m represents the number of markers. $p_{k}$ denotes the allele frequency of the Kth SNP. Therefore, the H⁻¹ formula is given by $H^{- 1} = A^{- 1} + [\begin{array}{c} 0 & 0 \\ 0 & G^{- 1} + A_{22}^{- 1} \end{array}]$ : where $A^{- 1}$ represents the inverse matrix of all pedigree relationships, $G^{- 1}$ represents the inverse matrix of genomic kinship relationships, and $A_{22}^{- 1}$ represents the inverse matrix of the pedigree relationships for the sequenced individuals. The Bayes method was calculated using the GIBBS1F90 module in the BLUPF90 software along with the Bayes–Gibbs sampling method. In the Bayes method, the total chain length of the samples was 100,000, the burn-in chain length was 10,000, and the thinning interval was 50. The Geweke diagnostic method in POSTGIBBSF90 was used to check the convergence of the Gibbs chain (Zhang et al., 2022).

2.5.4 Calculation of heritability

The calculation formula of heritability is as follows:

$h^{2} = \frac{σ_{a}^{2}}{σ_{a}^{2} + σ_{e}^{2}}$

The formula for the standard error of heritability is shown below:

${S E}^{2} (h^{2}) = [\frac{σ_{a}^{2}}{σ_{p}^{2}}] [\frac{V a r (σ_{a}^{2})}{{(σ_{a}^{2})}^{2}} + \frac{V a r (σ_{p}^{2})}{{(σ_{p}^{2})}^{2}} - \frac{C o v (σ_{a}^{2}, σ_{p}^{2})}{σ_{a}^{2} σ_{p}^{2}}]$

where $h^{2}$ is heritability, ${S E}^{2} (h^{2}) is the standard error of heritability, σ_{a}^{2}$ is the additive genetic variance, $σ_{e}^{2}$ is the residual variance. $σ_{p}^{2}$ is the overall phenotypic variance, $σ_{p}^{2} = σ_{a}^{2} + σ_{e}^{2}$ .

2.5.5 Verification of the reliability of genomic breeding values

To verify the accuracy of the estimated genomic breeding values for the joint reference populations, 50 offspring individuals born in the past 4 years from 485 genotyped Xinjiang Brown cattle served as a validation group. Genomic predictions were performed in two groups: with and without excluding the phenotypic data of the validation group. This resulted in 16 sets of genetic parameters and genomic estimated breeding values for each milk trait. By comparing the prediction accuracy, estimation bias, and inflation coefficient of the validation group, the optimal number of joint reference populations was determined.

To calculate the prediction accuracy of the genomic estimated breeding values, the correlation coefficient between the genomic breeding values calculated with the phenotypic data of the validation group and those calculated without these data was used to measure the accuracy of estimating genomic breeding values for different joint reference populations. The formula is as follows: $R_{GEBV} = Cor ({TBV}^{*}, GEBV)$ , where $R^{2}$ represents reliability or the square of accuracy.

Meanwhile, the regression of ${TBV}^{*}$ on $GEBV$ is calculated using the formula ${TBV}^{*} = b_{0} + b_{1} GEBV$ , where the regression coefficient $b_{1}$ is the inflation coefficient, and the intercept $b_{0}$ is the estimation bias (Legarra and Reverter, 2019).

$b_{1} = \frac{c o v (G E B V, y)}{v a r (y)} = \frac{v a r (G E B V)}{v a r (y)} < 1$

$b_{1} = \frac{V a r (G E B V)}{V a r (y)} = \frac{V a r (G E B V) V a r (a)}{V a r (a) V a r (y)} = r_{G E B V}^{2} r_{y}^{2} ≪ 1$

In this context, $b_{1}$ represents the regression coefficient, which has the following implications: when $b_{1} < 1$ , GEBV is inflated, indicating that Var (GEBV) is too large. This means that in the genomic breeding values, the good ones are even better, and the bad ones are even worse. Conversely, when $b_{1} > 1$ , GEBV is deflated, suggesting that Var (GEBV) is too small. This indicates that the genomic breeding values are smaller than the true values and are contracted toward the middle.

3 Results

3.1 Genetic structure

3.1.1 Linkage disequilibrium analysis

Linkage disequilibrium analysis was conducted between the two breeds by calculating the linkage disequilibrium coefficients for the two loci and plotting the LD decay graph (Figure 2). The graph shows that the average LD coefficients for Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows at a genomic distance of 50 kb were approximately 0.2, 0.25, 0.3, and 0.35, respectively, indicating a gradual increase. Noteworthy, the decay rates of the LD coefficients vary among different populations. Among the breeds, Brown Swiss bulls exhibited the slowest LD decay at 0–40 kb, whereas the Chinese Holstein cows exhibited the fastest LD decay. However, in the range of 40–300 kb, Xinjiang Brown cows exhibited the fastest LD decay, with a decay rate order of Xinjiang Brown cows > Chinese Holstein cows > Xinjiang Brown bulls > Brown Swiss bulls.

Figure 2

Figure 2. LD decay of Xinjiang Brown cattle and Chinese Holstein cow. BSB is Brown Swiss Bull, CHC is Chinese Holstein cow, XBB is Xinjiang Brown Bull, XBC is Xinjiang Brown Cow.

3.1.2 Population structure analysis

To further investigate the genetic components of Xinjiang Brown and Chinese Holstein cows, population structure and phylogenetic tree analyses were conducted. As shown in Figure 3, when the number of ancestral populations K = 2, there was a clear distinction in the genetic structure among Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows. However, the genetic structure within each group differed insignificantly. As shown in Figure 4, Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls were clustered, whereas the Chinese Holstein cows were clustered separately. In addition, Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls appear at the end of a certain branch within the Chinese Holstein population.

Figure 3

Figure 3. Analysis chart of population structure in Xinjiang Brown cattle and Chinese Holstein cow. BSB is Brown Swiss Bull, CHC is Chinese Holstein cow, XBB is Xinjiang Brown Bull, XBC is Xinjiang Brown Cow.

Figure 4

Figure 4. Phylogenetic tree of Xinjiang Brown cattle and Chinese Holstein cow. BSB is Brown Swiss Bull, CHC is Chinese Holstein cow, XBB is Xinjiang Brown Bull, XBC is Xinjiang Brown Cow.

3.1.3 Genetic relatedness between Xinjiang Brown and Chinese Holstein cows

Using the SNP genotyping information from 403 Xinjiang Brown cows, 71 Xinjiang Brown bulls, 11 Brown Swiss bulls, and 2,633 Chinese Holstein cows, a G-matrix was constructed. Figure 5 was then generated on the basis of the actual genetic relatedness among individuals in the G-matrix. Figure 5 shows that the kinship coefficients among Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls populations were approximately 0.5, which is significantly higher than those among individuals within the Chinese Holstein cow population. The kinship coefficients between Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows populations tend toward 0.

Figure 5

Figure 5. Genomic relationship matrix of Xinjiang Brown cattle and Chinese Holstein cow. BSB is Brown Swiss Bull, CHC is Chinese Holstein cow, XBB is Xinjiang Brown Bull, XBC is Xinjiang Brown Cow.

3.1.4 PCA

PCA was performed using the genomic kinship relationship matrix (G-matrix) among individuals from the two breeds (Figures 2–7). The results revealed that the first principal component (PC1, accounting for 4.75%) separated Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows into distinct groups. Specifically, Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls were closely clustered. The second principal component (PC2, accounting for 1.76%) could not distinguish between Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls; however, it separated the Chinese Holstein cows into two distinct groups. The third principal component (PC3, accounting for 1.22%) further distinguished the Chinese Holstein cows into two groups.

Figure 6

Figure 6. Principal Component Analysis of Xinjiang Brown Cattle and Chinese Holstein cow.

Figure 7

Figure 7. Gradient grouping of PCA analysis in joint reference group.

3.2 Multi-breed genomic evaluation using a joint reference population

3.2.1 Descriptive statistics of the dairy traits of Xinjiang brown and Chinese Holstein cattle

Table 4 lists the statistics, including sample size, minimum value, maximum value, mean, standard deviation, and coefficient of variation of the observed dairy traits of Xinjiang Brown and Chinese Holstein cattle. 305dMY, MFY, MPY, and SCS between Xinjiang Brown and Chinese Holstein cattle differed significantly.

Table 4

Table 4. Description of milk traits in Xinjiang Brown Cattle and Chinese Holstein Cattle.

3.2.2 Random grouping of the joint reference group

The chip data of 2,633 Chinese Holstein cows were screened to eliminate individuals with distant kinship within the Chinese Holstein cattle population, leaving 2,271 genotyped Chinese Holstein cows. Among these, 2,100 cows were randomly selected as the total population of Chinese Holstein cows to be included in the joint reference group. Random equal-sized groupings were then performed on 2,100 genotyped Chinese Holstein cows, resulting in subsets with different numbers of cows. As shown in Figure 7, the distribution of the subsets in the total population was relatively scattered for the Chinese Holstein cows added to the joint reference group.

3.2.3 Estimation of genetic parameters of dairy traits in the joint reference group

As shown in Table 5, before the inclusion of the Chinese Holstein population, the heritability of 305dMY was 0.204 without excluding the phenotypic data of 50 genotype Xinjiang Brown cows, which decreased to 0.203 after data exclusion. When varying numbers of Chinese Holstein cows were incorporated into the joint reference group, the heritability of 305dMY in Xinjiang Brown cows was 0.137–0.249 without excluding the phenotypic data of the 50 genotype cows. However, after excluding these data, the heritability of 305dMY in Xinjiang Brown cows was adjusted to 0.169–0.254.

Table 5

Table 5. Genetic parameter estimation of 305dMY in joint reference population based on ssGBLUP.

As shown in Table 6, without the inclusion of Chinese Holstein cows, the heritability of MFY, was 0.07 when the phenotypic data of 50 genotype Xinjiang Brown cows were included, and it was 0.073 when the phenotypic data were excluded. When different numbers of Chinese Holstein cows were added to the reference population, the MFY, heritability of Xinjiang Brown cows was 0.073–0.086 when the phenotypic data of 50 genotype Xinjiang Brown cows were included, and it was 0.057–0.085 when the phenotypic data were excluded.

Table 6

Table 6. Genetic parameter estimation of MFY in joint reference population based on ssGBLUP.

As shown in Table 7, without the inclusion of the Chinese Holstein cows, the heritability of MPY, was 0.143 when the phenotypic data of 50 genotyped Xinjiang Brown cows were included, and it was 0.145 when the phenotypic data were excluded. When different numbers of Chinese Holstein cows were added to the reference population, the MPY, heritability of Xinjiang Brown cows was 0.123–0.158 when the phenotypic data of the 50 genotyped Xinjiang Brown cows were included, and it was 0.0142–0.174 when the phenotypic data were excluded.

Table 7

Table 7. Genetic parameter estimation of MPY in joint reference population based on ssGBLUP.

Table 8 shows that without the inclusion of the Chinese Holstein cow population, the heritability of SCS was 0.042 when the phenotypic data of the 50 genotyped Xinjiang Brown cows were included and 0.043 when the phenotypic data were excluded. After adding different numbers of the Chinese Holstein cows to the joint reference population, the SCS heritability for Xinjiang Brown cows was 0.02–0.062 when the phenotypic data of the 50 genotyped Xinjiang Brown cows were included. When the phenotypic data were excluded, the SCS heritability for Xinjiang Brown cows was 0.015–0.081.

Table 8

Table 8. Genetic parameter estimation of SCS in joint reference population based on ssGBLUP.

3.2.4 Verification of the genomic breeding value reliability

As shown in Table 9, when different numbers of Chinese Holstein cows were added to the joint reference population, the reliability of the total population genomic breeding values for 305dMY of Xinjiang Brown cows was 0.142–0.340, with a regression coefficient of 0.129–0.312. The reliability of the genomic breeding values for the validation population was −0.033–0.087, with a regression coefficient of −0.064–0.056.

Table 9

Table 9. Genetic parameter estimation of 305dMY in joint reference population based on ssGBLUP.

As shown in Table 10, when different numbers of the Chinese Holstein cows were added to the joint reference population, the reliability of the total population genomic breeding values for MFY, of Xinjiang Brown cows was 0.263–0.424, with a regression coefficient of 0.296–0.437. The reliability of the genomic breeding values for the validation population was −0.149–0.138, with a regression coefficient of −0.192–0.137.

Table 10

Table 10. Genetic parameter estimation of MFY in joint reference population based on ssGBLUP.

As shown in Table 11, when different numbers of Chinese Holstein cows were added to the joint reference population, the reliability of the total population genomic breeding values for MPY, of Xinjiang Brown cows was 0.28–0.465, with a regression coefficient of 0.277–0.504. The reliability of the genomic breeding values for the validation population was −0.259–0.203, with a regression coefficient of −0.213–0.032.

Table 11

Table 11. Genetic parameter estimation of MPY in joint reference population based on ssGBLUP.

As shown in Table 12, when different numbers of Chinese Holstein cows were added to the joint reference population, the reliability of the total population genomic breeding values for SCS, of Xinjiang Brown cows was 0.190–0.448, with a regression coefficient of 0.207–0.502. The reliability of the genomic breeding values for the validation population was −0.145–0.062, with a regression coefficient of −0.223–0.075.

Table 12

Table 12. Genetic parameter estimation of SCS in joint reference population based on ssGBLUP.

4 Discussion

4.1 Genetic structure analysis

LD is a metric that quantifies whether genotype variations in two SNP markers are relatively consistent and whether they are correlated (Park, 2012). If two loci with adjacent alleles are correlated, certain genotypes tend to be co-inherited, resulting in a higher frequency of certain haplotypes than expected. This pattern can be visually represented using an LD decay plot (Amaral et al., 2008). Because of the correlated inheritance of the two loci, the decay rate of the LD coefficient decreases with increasing generations and recombination events. Genetic background also influences it. Domestication selection reduces genetic diversity within a population, reinforcing the correlation or linkage between SNP loci (Farnir et al., 2000). Consequently, populations with higher degrees of domestication exhibit stronger selection intensities (Odani et al., 2006), resulting in slower LD decay rates. The higher selection intensity in breeding bulls likely reduced the effective population size, thereby affecting LD in these groups. Meanwhile, the LD decay patterns in Xinjiang Brown and Chinese Holstein cows exhibit potential similarities, favoring the construction of a combined reference population. Overall, the LD decayed fastest in Xinjiang Brown cows, indicating lower levels of selection than the other three groups. This suggests a high level of genetic diversity in Xinjiang Brown cows, which harbor rich genetic resources with potential for development and use. These findings provide a scientific basis for the conservation, exploitation, and use of genetic diversity in Xinjiang Brown cattle.

Genetic structure analysis can elucidate phylogenetic relationships and genetic distances among different populations (Whelan and Goldman, 2001). Under the influence of natural and artificial selection, populations exhibiting pronounced genetic differences are evident. When K = 2, there is a distinct genetic structure differentiation among Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows. This is related to the breeding strategies employed during the intense selection process of Xinjiang Brown and Chinese Holstein cattle, where most female progenitors in the early stages of population breeding originated from local Chinese yellow cattle (Liu, 2013).

The breeding of Xinjiang Brown and Chinese Holstein cattle involves the introduction of foreign breeds for crossbreeding to improve and enhance the local yellow cattle population in China. Subsequently, through crossbreeding fixation and selective breeding, these breeds have been further developed and stabilized. The genetic background of Xinjiang Brown cattle is traceable to the original crossbreeding improvement in 1951, when the maternal breed was Kazakh cattle (Ma, 2015). Conversely, the genetic background of the Chinese Holstein cattle dates back to 1840. From 1840 to 1948, the Chinese Holstein cattle underwent more than a century of introduction and early stages of crossbreeding improvement. During this period, China initially introduced various dairy breeds, including the Holstein cattle, Jersey cattle, Ayrshire cattle, Brown Swiss cattle, and Shorthorn cattle (Liu, 2013). The distinct genetic structure observed among Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows indicates a significant genetic distance between these two major groups. This significantly affects the genetic structure of multi-breed joint reference genomes established later, subsequently affecting the accuracy of genomic predictions.

Figure 5 shows that the coefficient of kinship among individuals within the populations of Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls was relatively high. However, the coefficient of kinship between these groups and the Chinese Holstein cow population tended to be close to 0. This result suggests that when conducting multi-breed genomic selection, it is not advisable to use only one breed as the reference population and the other as the selection population. Instead, it is necessary to fully consider the relationship between the selection and reference populations. When the kinship between the two breeds involved in multi-breed genomic selection is weak or extremely weak, it is necessary to include a certain number of individuals from the same breed in the reference population to ensure high reliability in the estimation of genomic breeding values.

The PCA results identified two major clusters: one containing Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows, whereas the other mainly contained Chinese Holstein cows. This suggests a relatively distant genetic relationship between these two clusters. Clustering of Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls is highly concentrated, indicating a close genetic distance among these groups. When considering only the clustering of Chinese Holstein cows, a small subset can be distinguished from the larger population, which exhibits a distant genetic relationship with Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls and with most Chinese Holstein cows. In the subsequent multi-breed genetic evaluations, it is recommended to exclude chip and phenotypic data from this subset of Chinese Holstein cows and their respective farms. This approach can help reduce the interference of genetic structure differences while estimating SNP effects. These findings suggest a low genetic linkage between Xinjiang Brown and Chinese Holstein cattle.

4.2 Genetic parameter analysis of the joint reference population

The number of genotyped Chinese Holstein cattle in the joint reference population significantly affects the estimation of variance components. Therefore, it is necessary to consider the number of genotyped animals from different populations in the joint reference population during multi-breed genetic evaluations. The most important factors that affect the reliability of genomic breeding value estimation are the proportion of genetic variance explained by SNPs and trait heritability (Steyn et al., 2019; van Grevenhof et al., 2019). These genetic parameters are directly related to the size and structure of the training population as well as the range, quality, and quantity of phenotypic and genomic information available for individuals in the training population. To ensure accurate genomic breeding value estimation, it is important to minimize the relationship between genotyped individuals within the training population and maximize the relationship between the training and prediction populations. Because ssGBLUP generates genomic breeding values for cows, it is particularly useful for cows with only parental average information. The single-step genomic evaluation combines information from all countries, considering potential duplicate counting of the same information, thereby ensuring more accurate estimation of genomic breeding values. Adding carefully selected cows to the training population can expand the population, improve its structure and relationship with the prediction population, and reduce selection bias. However, in this study, a large proportion of the chip data came from cows, with relatively fewer bulls for validation, probably being a reason for the poor prediction performance.

The ssGBLUP method can simultaneously analyze phenotypic, genomic, and pedigree information from both genotype and non-genotype animals by integrating external information. This method is particularly convenient when using foreign paternal genetic material. For example, during the Interbull evaluation of Brown Swiss bulls, a country can obtain genomic information from multiple countries and MACE information. Combining external MACE information with ssGBLUP can complement paternal information from different countries and provide pseudophenotypic information for foreign paternal lines with no or few offsprings. This research result suggests that we can conduct cross-country genetic evaluations with the Brown Swiss bull origin introduced during the breeding of Xinjiang Brown cattle, which could improve the reliability of genomic predictions for Xinjiang Brown cattle.

4.3 Reliability of genetic evaluation in joint reference populations

In theory, a model that assumes the closest distribution of SNP effects to their true distribution can achieve the highest reliability in genomic prediction. The GBLUP model assumes that all SNP effects follow the same normal distribution and compresses the effects of all SNPs to the same degree because different models have different assumptions about the distribution of SNP effects (Villar-hernÁndez et al., 2021). Several methods have been proposed to improve the accuracy of genomic prediction in small populations of dairy cattle (Marjanovic et al., 2021), and one effective approach is to use joint reference populations by combining reference data from different populations (Steyn et al., 2019; van Grevenhof et al., 2019). This method has reported significant benefits in genomic prediction for North American Holstein, European Holstein, Chinese Holstein, and Brown Swiss populations (Vanderick et al., 2017; van den berg et al., 2016). However, the accuracy of genomic prediction is dependent on the relationship between candidate and reference animals, requiring the reference population to be sufficiently close to the target population (Xu et al., 2019). Therefore, for multi-breed joint genetic evaluation between Xinjiang Brown and Chinese Holstein cattle, considering only the added number of animals is insufficient. Further in-depth analysis of important influencing factors, including assumptions about SNP effects (van den berg et al., 2019) and the weights of the A- and G-matrices in the H-matrix (Karaman et al., 2018; Botelho et al., 2021), is required to improve the accuracy and unbiasedness of predictions (Botelho et al., 2021).

Including cows in the genotyping reference population is necessary because of the limited number of cows with reliable phenotypic information available for predicting offspring traits. To improve the genomic breeding values of the population, it is necessary to include a certain number of validated bulls with reliable phenotypic information in the reference population (Vanraden et al., 2020). Previous studies have reported that the inclusion of cows in the validated bull reference population can improve the accuracy of genomic prediction (Ding et al., 2013). Although the phenotypic information for cows is less accurate than that for bulls with offspring validation, additional information can still be significant (Cole et al., 2021), considering the large number of cows available as reference animals.

5 Conclusion

The genetic structure of mature Xinjiang Brown and Chinese Holstein cows is different, and the individual kinship between these two populations is relatively distant. This increases the impact of genetic structure and kinship on the reliability of genomic breeding value estimation. Through comparisons of parameters, including heritability, breeding value reliability, and unbiasedness, it was initially determined that including 600 and 900 Chinese Holstein cows in the joint reference population positively impacted the genomic prediction of Xinjiang Brown cattle to some extent. In multi-breed genome selection, it is necessary to pre-evaluate the genetic structure and genetic relationship of the population. It is feasible to combine the Chinese Holstein cattle population into the Xinjiang brown cattle population to form a joint reference group for cross-breed genetic assessment. It can provide theoretical guidance for applied genomic genetic assessment and multi-breed genomic genetic assessment of Xinjiang brown cattle, and also provide reference for genome selection of other dual-use cattle and small population breeds.

Data availability statement

The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Ethics statement

The animal studies were approved by Experimental animal Welfare Ethics Committee, Xinjiang Agricultural University. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent was obtained from the owners for the participation of their animals in this study.

Author contributions

MZ: Investigation, Writing–original draft, Writing–review and editing. LX: Data curation, Writing–review and editing. HiL: Writing–original draft. HnL: Methodology, Writing–original draft. JZ: Formal Analysis, Investigation, Writing–review and editing. DW: Investigation, Writing–review and editing. XZ: Investigation, Writing–review and editing. XH: Conceptualization, Resources, Supervision, Writing–review and editing. YW: Conceptualization, Writing–original draft, Writing–review and editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work was funded by the Xinjiang Uygur Autonomous Region Science and technology Aid Xinjiang Project (Grant No. 2018E02052), The Xinjiang Uygur Autonomous Region University Scientific Research Project (Grant No. XJEDU 2017I005) and National Modern Agriculture Industrial System Project (Grant No. CARS-36).

Acknowledgments

We greatly thank the Xinjiang Tianshan Animal Husbandry Bio-Engineering Co., Ltd., Xinjiang Uygur Autonomous Region local state-owned Urumqi cattle farm, Yili Xinjiang Brown cattle farm, Tacheng Agriculture and Animal Husbandry Technology Co., Ltd., China for data and samples collection.

Conflict of interest

Author JZ was employed by Shijiazhuang Molbreeding Biotechnology Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Aguilar, I., Misztal, I., Johnson, D. L., Legarra, A., Tsuruta, S., and Lawlor, T. J. (2010). Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 93 (2), 743–752. doi:10.3168/jds.2009-2730

PubMed Abstract | CrossRef Full Text | Google Scholar

Alexander, D. H., Novembre, J., and Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19 (9), 1655–1664. doi:10.1101/gr.094052.109

PubMed Abstract | CrossRef Full Text | Google Scholar

Amaral, A. J., Megens, H., Crooijmans, R. P. M. A., Heuven, H. C. M., and Groenen, M. A. M. (2008). Linkage disequilibrium decay and haplotype block structure in the pig. Genetics 179 (1), 569–579. doi:10.1534/genetics.107.084277

PubMed Abstract | CrossRef Full Text | Google Scholar

Boichard, D., Ducrocq, V., Croiseau, P., and Fritz, S. (2016). Genomic selection in domestic animals: principles, applications and perspectives. Comptes Rendus Biol. 339 (7-8), 274–277. doi:10.1016/j.crvi.2016.04.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Botelho, M. E., Lopes, M. S., Mathur, P. K., Knol, E. F., Guimarães, S. E. F., Marques, D. B. D., et al. (2021). Applying an association weight matrix in weighted genomic prediction of boar taint compounds. J. Anim. Breed. Genet. 138 (4), 442–453. doi:10.1111/jbg.12528

PubMed Abstract | CrossRef Full Text | Google Scholar

Brøndum, R. F., Su, G., Janss, L., Sahana, G., Guldbrandtsen, B., Boichard, D., et al. (2015). Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction. J. Dairy Sci. 98 (6), 4107–4116. doi:10.3168/jds.2014-9005

PubMed Abstract | CrossRef Full Text | Google Scholar

Christensen, O. F., and Lund, M. S. (2010). Genomic prediction when some animals are not genotyped. Genet. Sel. Evol. Paris. 42 (1), 2. doi:10.1186/1297-9686-42-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Cole, J. B., Dürr, J. W., and Nicolazzi, E. L. (2021). Invited review: the future of selection decisions and breeding programs: what are we breeding for, and who decides? J. Dairy Sci. 104 (5), 5111–5124. doi:10.3168/jds.2020-19777

PubMed Abstract | CrossRef Full Text | Google Scholar

Consortium, T. B. H., Gibbs, R. A. A., Taylor, J. F. A., Van Tassell, C. P., Barendse, W., Eversole, K. A., et al. (2009). Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324 (5926), 528–532. doi:10.1126/science.1167936

PubMed Abstract | CrossRef Full Text | Google Scholar

Dassonneville, R., Brndum, R. F., Druet, T., Fritz, S., Guillaume, F., Guldbrandtsen, B., et al. (2011). Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. J. Dairy Sci. 94 (7), 3679–3686. doi:10.3168/jds.2011-4299

PubMed Abstract | CrossRef Full Text | Google Scholar

de Roos, A. P. W., Hayes, B. J., and Goddard, M. E. (2009). Reliability of genomic predictions across multiple populations. Genetics 183 (4), 1545–1553. doi:10.1534/genetics.109.104935

PubMed Abstract | CrossRef Full Text | Google Scholar

Ding, X., Zhang, Z., Li, X., Wang, S., Wu, X., Sun, D., et al. (2013). Accuracy of genomic prediction for milk production traits in the Chinese Holstein population using a reference population consisting of cows. J. Dairy Sci. 96 (8), 5315–5323. doi:10.3168/jds.2012-6194

PubMed Abstract | CrossRef Full Text | Google Scholar

Edel, C., Schwarzenbacher, H., Hamann, H., Neuner, S., Emmerling, R., and Götz, K. U. (2011). The German-Austrian genomic evaluation system for Fleckvieh (Siemmental) cattle[J]. Interbull Bull. (44), 152–156.

Google Scholar

Farnir, F., Coppieters, W., Arranz, J., Berzi, P., Cambisano, N., Grisart, B., et al. (2000). Extensive genome-wide linkage disequilibrium in cattle. Genome Res. 10 (2), 220–227. doi:10.1101/gr.10.2.220

PubMed Abstract | CrossRef Full Text | Google Scholar

George, R., Wiggans, J. B. C. S., Hubbard, S. M., and Sonstegard, T. S. (2017). Genomic selection in dairy cattle: the USDA experience. Annu. Rev. animal Biosci. 5, 309–327. doi:10.1146/annurev-animal-021815-111422

PubMed Abstract | CrossRef Full Text | Google Scholar

Hayes, B., Meuwissen, T., and Goddard, M. (2016). Genomic selection: a paradigm shift in animal breeding. Anim. Front. 6 (1), 6–14. doi:10.2527/af.2016-0002

CrossRef Full Text | Google Scholar

Hill, W. G. (1974). Estimation of linkage disequilibrium in randomly mating populations. Heredity 33 (2), 229–239. doi:10.1038/hdy.1974.89

PubMed Abstract | CrossRef Full Text | Google Scholar

Jorjani, H., Jakobsen, J., Hjerpe, E., Palucci, V., and Dürr, J. W. (2012). Status of genomic evaluation in the Brown Swiss populations. Interbull (46), 46–54.

Google Scholar

Karaman, E., Lund, M. S., Anche, M. T., Janss, L., and Su, G. (2018). Genomic prediction using multi-trait weighted GBLUP accounting for heterogeneous variances and covariances across the genome. G3 (Bethesda, Md.) 8 (11), 3549–3558. doi:10.1534/g3.118.200673

PubMed Abstract | CrossRef Full Text | Google Scholar

Legarra, A., and Reverter, A. (2019). Correction to: semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method. Genet. Sel. Evol. 51 (1), 69. doi:10.1186/s12711-019-0511-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, G. C. (2013). Special history of China's dairy industry. Beijing: China Agriculture Press.

Google Scholar

Lund, M. S., Roos, A. P. W. D., Vries, A. G. D., Druet, T., Ducrocq, V., Fritz, S., et al. (2011). A common reference population from four European Holstein populations increases reliability of genomic predictions. Genet. Sel. Evol. Paris. 43 (1), 43. doi:10.1186/1297-9686-43-43

PubMed Abstract | CrossRef Full Text | Google Scholar

Lund, M. S., Su, G., Janss, L., Guldbrandtsen, B., and Brøndum, R. F. (2014). Genomic evaluation of cattle in a multi-breed context. Livest. Sci. 166, 101–110. doi:10.1016/j.livsci.2014.05.008

CrossRef Full Text | Google Scholar

Ma, Y. (2015). Genetic variations of chinese holstein bullsin fak gene regulation zone identification andits associated with semen quality analysis (Shandong: Shandong Normal University).

Google Scholar

Marjanovic, J., Hulsegge, B., and Calus, M. P. L. (2021). Relatedness between numerically small Dutch Red dairy cattle populations and possibilities for multibreed genomic prediction. J. Dairy Sci. 104 (4), 4498–4506. doi:10.3168/jds.2020-19573

PubMed Abstract | CrossRef Full Text | Google Scholar

Metta, M., Kanginakudru, S., Gudiseva, N., and Nagaraju, J. (2004). Genetic characterization of the Indian cattle breeds, Ongole and Deoni (Bos indicus), using microsatellite markers - a preliminary study. BMC Genet. 5 (1), 16. doi:10.1186/1471-2156-5-16

PubMed Abstract | CrossRef Full Text | Google Scholar

Odani, M., Narita, A., Watanabe, T., Yokouchi, K., Sugimoto, Y., Fujita, T., et al. (2006). Genome-wide linkage disequilibrium in two Japanese beef cattle breeds. Anim. Genet. 37 (2), 139–144. doi:10.1111/j.1365-2052.2005.01400.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Palombo, V., Pegolo, S., Conte, G., Cesarani, A., Macciotta, N. P. P., Stefanon, B., et al. (2021). Genomic prediction for latent variables related to milk fatty acid composition in Holstein, Simmental and Brown Swiss dairy cattle breeds. J. Anim. Breed. Genet. 138 (3), 389–402. doi:10.1111/jbg.12532

PubMed Abstract | CrossRef Full Text | Google Scholar

Park, L. (2012). Linkage disequilibrium decay and past population history in the human genome. PloS one 7 (10), e46603. doi:10.1371/journal.pone.0046603

PubMed Abstract | CrossRef Full Text | Google Scholar

Pryce, J. E., Gredler, B., Bolormaa, S., Bowman, P. J., Egger-Danner, C., Fuerst, C., et al. (2011). Short communication: genomic selection using a multi-breed, across-country reference population. J. Dairy Sci. 94 (5), 2625–2630. doi:10.3168/jds.2010-3719

PubMed Abstract | CrossRef Full Text | Google Scholar

Steyn, Y., Lourenco, D. A. L., and Misztal, I. (2019). Genomic predictions in purebreds with a multibreed genomic relationship matrix1. J. Animal Sci. 97 (11), 4418–4427. doi:10.1093/jas/skz296

PubMed Abstract | CrossRef Full Text | Google Scholar

van den Berg, I., Boichard, D., and Lund, M. S. (2016). Comparing power and precision of within-breed and multibreed genome-wide association studies of production traits using whole-genome sequence data for 5 French and Danish dairy cattle breeds. J. Dairy Sci. 99 (11), 8932–8945. doi:10.3168/jds.2016-11073

PubMed Abstract | CrossRef Full Text | Google Scholar

van den Berg, I., Meuwissen, T. H. E., Macleod, I. M., and Goddard, M. E. (2019). Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction. J. Dairy Sci. 102 (4), 3155–3174. doi:10.3168/jds.2018-15231

PubMed Abstract | CrossRef Full Text | Google Scholar

Vanderick, S., Gillon, A., Glorieux, G., Mayeres, P., Mota, R., and Gengler, N. (2017). Usefulness of multi-breed models in genetic evaluation of direct and maternal calving ease in Holstein and Belgian Blue Walloon purebreds and crossbreds. Livest. Sci. 198 (1), 129–137. doi:10.1016/j.livsci.2017.02.019

CrossRef Full Text | Google Scholar

van Grevenhof, E. M., Vandenplas, J., and Calus, M. P. L. (2019). Genomic prediction for crossbred performance using metafounders. J. animal Sci. 97 (2), 548–558. doi:10.1093/jas/sky433

CrossRef Full Text | Google Scholar

Vanraden, P. M., Tooker, M. E., Chud, T. C. S., Norman, H. D., Megonigal, J. H., Haagen, I. W., et al. (2020). Genomic predictions for crossbred dairy cattle. J. Dairy Sci. 103 (2), 1620–1631. doi:10.3168/jds.2019-16634

PubMed Abstract | CrossRef Full Text | Google Scholar

Villar-Hernández, B. D. J., Pérez-Elizalde, S., Martini, J. W. R., Toledo, F., Perez-Rodriguez, P., Krause, M., et al. (2021). Application of multi-trait Bayesian decision theory for parental genomic selection. G3 (Bethesda) 11 (2), jkab012. doi:10.1093/g3journal/jkab012

PubMed Abstract | CrossRef Full Text | Google Scholar

Weigel, K. A. A., de Los Campos, G. A., Vazquez, A. I. A., Rosa, G. J. M., Gianola, D., and Van Tassell, C. P. (2010). Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. J. Dairy Sci. 93 (11), 5423–5435. doi:10.3168/jds.2010-3149

PubMed Abstract | CrossRef Full Text | Google Scholar

Weller, J. I., Ezra, E., and Ron, M. (2017). Invited review: a perspective on the future of genomic selection in dairy cattle. J. Dairy Sci. 100 (11), 8633–8644. doi:10.3168/jds.2017-12879

PubMed Abstract | CrossRef Full Text | Google Scholar

Whelan, S., and Goldman, N. (2001). A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18 (5), 691–699. doi:10.1093/oxfordjournals.molbev.a003851

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, L., Wang, Z., Zhu, B., Liu, Y., Li, H., Bordbar, F., et al. (2019). Theoretical evaluation of multi-breed genomic prediction in Chinese indigenous cattle. Anim. (Basel) 9 (10), 789. doi:10.3390/ani9100789

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, M. H., You, Z. C., Wei, C., Li, J. F., Ge, J. J., Chen, G. L., et al. (2021). Analysis of influencing factors on production performance of Chinese Holstein cattle and Xinjiang brown cattle in Xinjiang region. Chin. J. Animal Sci. 57 (5), 246–252. doi:10.19556/j.0258-7033.20201113-04

CrossRef Full Text | Google Scholar

Zhang, C., Dong, S., Xu, J., He, W. M., and Yang, T. L. (2019). PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 35 (10), 1786–1788. doi:10.1093/bioinformatics/bty875

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, M., Luo, H., Xu, L., Shi, Y., Zhou, J., Wang, D., et al. (2022). Genomic selection for milk production traits in Xinjiang Brown cattle. Animals 12 (2), 136. doi:10.3390/ani12020136

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, L., Heringstad, B., Su, G., Guldbrandtsen, B., Meuwissen, T. H. E., Svendsen, M., et al. (2014). Genomic predictions based on a joint reference population for the Nordic Red cattle breeds. J. Dairy Sci. 97 (7), 4485–4496. doi:10.3168/jds.2013-7580

PubMed Abstract | CrossRef Full Text | Google Scholar

Zumbach, B., Jorjani, H., and Dürr, J. (2010). Brown Swiss genomic evaluation. Interbull Bull. (42), 44–45.

Google Scholar

Keywords: Xinjiang Brown cattle, multi-breed, genomic prediction, bayes, single-step GBLUP

Citation: Zhang M, Xu L, Lu H, Luo H, Zhou J, Wang D, Zhang X, Huang X and Wang Y (2024) Genomic prediction based on a joint reference population for the Xinjiang Brown cattle. Front. Genet. 15:1394636. doi: 10.3389/fgene.2024.1394636

Received: 01 March 2024; Accepted: 10 April 2024;
Published: 26 April 2024.

Edited by:

Zhihong Liu, Inner Mongolia Agricultural University, China

Reviewed by:

Mao Yongjiang, Yangzhou University, China
Lingyang Xu, Chinese Academy of Agricultural Sciences, China

Copyright © 2024 Zhang, Xu, Lu, Luo, Zhou, Wang, Zhang, Huang and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xixia Huang, au-huangxixia@163.com; Yachun Wang, wangyachun@cau.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Genomic prediction based on a joint reference population for the Xinjiang Brown cattle

1 Introduction

2 Materials and methods

2.1 Sample collection and DNA extraction

2.2 Sample screening and chip detection

2.3 Chip imputation and quality control

2.4 Genetic structure analysis

2.4.1 Linkage disequilibrium analysis

2.4.2 Population structure analysis

2.4.3 Principal component analysis

2.5 Multi-breed genomic evaluation using a joint reference population

2.5.1 Phenotypic data processing

2.5.2 Genotype data

2.5.3 Statistical analysis

2.5.4 Calculation of heritability

2.5.5 Verification of the reliability of genomic breeding values

3 Results

3.1 Genetic structure

3.1.1 Linkage disequilibrium analysis

3.1.2 Population structure analysis

3.1.3 Genetic relatedness between Xinjiang Brown and Chinese Holstein cows

3.1.4 PCA

3.2 Multi-breed genomic evaluation using a joint reference population

3.2.1 Descriptive statistics of the dairy traits of Xinjiang brown and Chinese Holstein cattle

3.2.2 Random grouping of the joint reference group

3.2.3 Estimation of genetic parameters of dairy traits in the joint reference group

3.2.4 Verification of the genomic breeding value reliability

4 Discussion

4.1 Genetic structure analysis

4.2 Genetic parameter analysis of the joint reference population

4.3 Reliability of genetic evaluation in joint reference populations

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

References

95% of researchers rate our articles as excellent or good