Transgressive Potential Prediction and Optimal Cross Design of Seed Protein Content in the Northeast China Soybean Population Based on Full Exploration of the QTL-Allele System

Feng, Weidan; Fu, Lianshun; Fu, Mengmeng; Sang, Ziqian; Wang, Yanping; Wang, Lei; Ren, Haixiang; Du, Weiguang; Hao, Xiaoshuai; Sun, Lei; Zhang, Jiaoping; Wang, Wubin; Xing, Guangnan; He, Jianbo; Gai, Junyi

doi:10.3389/fpls.2022.896549

ORIGINAL RESEARCH article

Front. Plant Sci., 12 July 2022

Sec. Plant Breeding

Volume 13 - 2022 | https://doi.org/10.3389/fpls.2022.896549

This article is part of the Research TopicSoybean Molecular Breeding and GeneticsView all 25 articles

Transgressive Potential Prediction and Optimal Cross Design of Seed Protein Content in the Northeast China Soybean Population Based on Full Exploration of the QTL-Allele System

Weidan Feng^1,2^†

Lianshun Fu³^†

Mengmeng Fu¹^†

Ziqian Sang¹

Yanping Wang⁴

Lei Wang^1,2

Haixiang Ren⁴

Weiguang Du⁴

Xiaoshuai Hao^1,2

Lei Sun^1,2

Jiaoping Zhang^1,2

Wubin Wang^1,2

Guangnan Xing^1,2

Jianbo He^1,2^*

Junyi Gai^1,2,5^*

¹Soybean Research Institute/MARA National Center for Soybean Improvement/MARA Key Laboratory of Biology and Genetic Improvement of Soybean (General), Nanjing Agricultural University, Nanjing, China
²State Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
³Tieling Academy of Agricultural Sciences, Tieling, China
⁴Mudanjiang Research and Development Center for Soybean/Mudanjiang Experiment Station of the National Center for Soybean Improvement, Mudanjiang Branch of Heilongjiang Academy of Agricultural Sciences, Mudanjiang, China
⁵Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, China

Northeast China is a major soybean production region in China. A representative sample of the Northeast China soybean germplasm population (NECSGP) composed of 361 accessions was evaluated for their seed protein content (SPC) in Tieling, Northeast China. This SPC varied greatly, with a mean SPC of 40.77%, ranging from 36.60 to 46.07%, but it was lower than that of the Chinese soybean landrace population (43.10%, ranging from 37.51 to 50.46%). The SPC increased slightly from 40.32–40.97% in the old maturity groups (MG, MGIII + II + I) to 40.93–41.58% in the new MGs (MG0 + 00 + 000). The restricted two-stage multi-locus genome-wide association study (RTM-GWAS) with 15,501 SNP linkage-disequilibrium block (SNPLDB) markers identified 73 SPC quantitative trait loci (QTLs) with 273 alleles, explaining 71.70% of the phenotypic variation, wherein 28 QTLs were new ones. The evolutionary changes of QTL-allele structures from old MGs to new MGs were analyzed, and 97.79% of the alleles in new MGs were inherited from the old MGs and 2.21% were new. The small amount of new positive allele emergence and possible recombination between alleles might explain the slight SPC increase in the new MGs. The prediction of recombination potentials in the SPC of all the possible crosses indicated that the mean of SPC overall crosses was 43.29% (+2.52%) and the maximum was 50.00% (+9.23%) in the SPC, and the maximum transgressive potential was 3.93%, suggesting that SPC breeding potentials do exist in the NECSGP. A total of 120 candidate genes were annotated and functionally classified into 13 categories, indicating that SPC is a complex trait conferred by a gene network.

Introduction

Soybean [Glycine max (L.) Merr.], which originated in ancient central China, is a traditional crop rich in seed protein (SPC, ~40%) and oil content (~20%) (Zhang et al., 2015a). It had been disseminated to Liao-river valleys in Northeast China (NEC) more than 2000 years ago and has expanded to the whole NEC in recent centuries. NEC is currently the major production area for soybean and a major source of soybean commodities for soy food processing, including tofu products and protein isolates for human food and animal feed in China (Warrington et al., 2015). However, the SPC of commercial soybeans in NEC is about 39 to 42%, less than in central and southern China (about 40 to 45%). The food processing companies demand increased SPC in commercial soybean production, especially in NEC. To improve soybean SPC, the first step is to investigate the phenotypic and genetic variation of the soybean germplasm to estimate whether there is genetic potential available to be utilized. Liu et al. (2020) found that the NEC soybean germplasm population (NECSGP) was derived from the original population from central China, with several newly derived and introduced accessions added during the recent century. The NECSGP was genetically clustered together with those from the north and south Americas and was the major germplasm source of the soybeans in the Americas, where ~85% of world soybeans are produced at present (Fu et al., 2020a). Thus, exploring the genetic basis of the SPC in NEC soybean germplasm is of great significance not only for NEC soybean production but also for global soybean production.

SPC is a quantitative trait controlled by many genes and is also affected by the environment (Hwang et al., 2014). There were 248 SPC QTLs (quantitative trait loci) reported at SoyBase (https://soybase.org). These SPC QTLs were detected by using linkage mapping procedures (Zhang et al., 2015a) and are mainly located on chromosomes 4, 5, 7, 8, 14, 15, 18, 19, and 20. Karikari et al. (2019) identified 25 SPC QTLs in a linkage mapping study under a single-locus model using a recombinant inbred line (RIL) population derived from Linhefenqingdou and Meng8206, in which qPro-7-1 was detected simultaneously in three environments, with an average phenotypic variance (PV) of 19.01%. Among these QTLs, 10 QTLs were newly detected and the PV of 12 QTLs were all greater than 10%, with the lowest PV of 8.97%. Teng et al. (2017) identified 8 SPC QTLs in 12 environments using the RIL population derived from Dongnong46×L-100, in which qPR-2, qPR-3, qPR-5, qPR-7, and qPR-8 were detected simultaneously in 6, 8, 7, 6, 7 environments, respectively. The candidate gene Glyma.20g085100 underlying the major SPC QTL on chromosome 20 was mapped and cloned (Fliege et al., 2022). The haplotype variation at this major QTL in wild and domesticated soybean was also explored using a germplasm population consisting of 985 accessions (Marsh et al., 2022).

QTL detection based on linkage mapping usually involves only two parental lines, such as the RIL population, where the genetic variation and mapping resolution are quite limited. Association mapping based on natural germplasm populations provides a powerful method for genome-wide QTL detection. By using association mapping in a large germplasm population consisting of 12,116 cultivated soybean accessions, Bandillo et al. (2015) detected 19 SNPs associated with SPC mainly on chromosome 15 (3.82 – 3.96 Mb) and chromosome 20 (29.59 – 31.97 Mb). Sonah et al. (2015) reported that eight regions were significantly associated with SPC based on 139 soybean accessions. The region on chromosome 8 between 45.5 and 46.9 Mb had the largest number of significantly associated SNPs, while there was only one associated SNP on chromosome 19 (50.4 Mb) and chromosome 20 (10.0 Mb). Zhang et al. (2017) reported that 15 loci were associated with SPC, with their phenotypic contribution ranging from 17.4 to 29.2%, and the candidate gene Glyma.13g123500 was highly expressed during seed development.

However, the previous association mapping studies were mainly based on single-locus model analysis. Each genome-wide marker was tested independently for its association with a quantitative trait. The Bonferroni-adjusted threshold was applied to correct the multiple testing problem (Sul et al., 2018; Tam et al., 2019). The stringent threshold in the single-locus model largely reduces the false positives and leads to many false negatives (Benjamini and Yekutieli, 2001). Furthermore, the bi-allelic SNP makers are usually used in association mapping. Therefore, the multiple alleles of a QTL that widely existed in germplasm populations cannot be detected directly (Nachman, 2001; Yang et al., 2012). He et al. (2017) proposed the restricted two-stage multi-locus model genome-wide association analysis (RTM-GWAS) method to thoroughly detect QTLs and their multiple alleles. This procedure has the following merits: (i) Use the SNP linkage disequilibrium blocks (SNPLDB) as markers with multiple haplotypes to fulfill the multiple allele characteristic in natural populations. (ii) Use two-stage GWAS for efficient association analysis, that is, first stage GWAS under single locus model for preselecting markers and second stage multi-locus model stepwise regression for identifying QTLs-alleles with trait heritability (h²) as the upper limit of QTL total contribution to reduce false positives and negatives. (iii) Use normal p-value without excessive Bonferroni correction. All the detected QTLs are tested jointly under the multi-locus model. (iv) Use plot-based phenotype data to minimize the error amount through experiment design to raise the QTL-identification precision (He and Gai, 2020; Liu et al., 2021). Therefore, RTM-GWAS can provide a high QTL detection power and efficiency. The QTL-allele matrix is further established based on the results as a compact form of the population's genetic structure and individual accessions. This procedure has been demonstrated for its effectiveness in a series of soybean germplasm studies and even bi-parental population studies, such as on 100-seed weight (Zhang et al., 2015b), seed isoflavone content (Meng et al., 2016), days to flowering (Liu et al., 2021), and main stem node number (Fahim et al., 2021). Using RTM-GWAS, 26 SPC QTLs were detected based on 279 soybean accessions from China's Yangtze and Huaihe River Valley (Li et al., 2019). These QTLs accounted for 58.3% of the phenotypic variation, with qProt-20-3 having the highest PV (16%). Li et al. (2020) detected 90 SPC QTLs using RTM-GWAS in a soybean nested association mapping population. Twenty QTLs were newly detected, and Glyma20g24830 and Glyma18g03540 were annotated as important candidate genes for SPC.

The germplasm collection of an ecoregion is historically accumulated and may vary from time to time due to additions and losses. The germplasm accessions used for genetic studies should represent the ecoregion population so that the conclusions drawn can explain the real population rather than some unknown population. In the present study, we recollected soybean accessions from all the research institutions in NEC and then chose those from all subregions and historical reserves to form a representative soybean germplasm sample in NEC. In addition, NEC covers a wide range of latitudes. For evaluation of SPC under the same environment, the experiment site should be at a place where all kinds of the maturity group soybeans can mature naturally. Based on the above considerations, this study aimed at (i) exploring the SPC variation in the NECSGP, (ii) exploring the SPC QTL-allele system in the NECSGP, (iii) characterizing the genetic mechanism in the evolutionary process from late to early maturity groups (MGs) in NEC, (iv) exploring the QTL-allele recombination potential for optimal cross design in NEC, and (v) inferring the SPC candidate gene system.

Materials and Methods

Plant Materials and Field Experiments

A total of 361 representative soybean accessions were collected and chosen from the NECSGP. The accessions covered six MGs, including MG III, MG II, MG I, MG 0, MG 00, and MG000 (Fu et al., 2020a). ln 2013–2014, these accessions were tested at Tieling, Northeast China. The “Blocks in Replication” design was used, with 4 hills in a row-plot, 1.0 m in length, and 1.0 m row space. According to their MGs, the accessions were grouped into six blocks and four replications were implemented each year. At the maturity (R8) stage, the plants in each plot were threshed and dried after harvest, and then the SPC was measured by using the FOSS NearInfared grain analyzer Infratec 1241.

Statistical Analysis

The experimental data were analyzed using a joint randomized block design analysis as an approximation for simplicity. The analysis of variance was performed using the PROC GLM procedure of the SAS/STAT software (SAS Institute Inc., Cary, NC, USA). The linear model was

\begin{array}{l} y_{i j k} = μ + t_{i} + r_{j (i)} + g_{k} + {(g t)}_{i k} + ε_{i j k}, \end{array}

where y_ijk is the phenotype value of the k-th accession for the j-th replication in the i-th environment, μ is the population mean, t_i is the effect of the i-th environment, r_{_j(i)} is the effect of the j-th replication in the i-th environment, g_k is the effect of the k-th accession, (gt)_ik is the interaction effect between accession and environment, and ε_ijk is the random error following N(0, σ²). Except that the effect of accession was considered fixed, all other effects were considered random. The trait heritability for the single environment and multiple environments was estimated, respectively, as

\begin{array}{l} h^{2} & = & σ_{g}^{2} / (σ_{g}^{2} + σ^{2} / n_{r}) and \\ h^{2} & = & σ_{g}^{2} / [σ_{g}^{2} + σ_{g t}^{2} / n_{t} + σ^{2} / (n_{t} \times n_{r})] \end{array}

where $σ_{g}^{2}$ is the genotype variance, $σ_{g t}^{2}$ is the genotype and year interaction variance, σ² is the error variance, n_t is the number of years, and n_r is the number of replications. The variance components were estimated using the PROC MIXED procedure of the SAS/STAT software (SAS Institute Inc., Cary, NC, USA). The genetic coefficient of variation (GCV) was calculated as GCV = σ_g/μ.

SNP Genotyping, SNPLDB Assembly, and RTM-GWAS Analysis

The genotype data of the 361 accessions were obtained from Fu et al. (2020a), and the accessions were sequenced with restriction site-associated DNA sequencing technology (RAD-seq) (Miller et al., 2007) at BGI tech, Shenzhen, China. All sequence reads were aligned against the reference genome Wm82.a1.v1.1 (Schmutz et al., 2010) using the SOAP2 (Li et al., 2009) software. The RealSFS (Yi et al., 2010) was used for SNP calling. The SNPs with missing rat e >20%, heterozygosity rate >20%, and minor allele frequency (MAF) < 0.01 were filtered out. The missing genotypes were then imputed using the fastPHASE software (Scheet and Stephens, 2006). Finally, 82,966 high-quality SNPs were obtained. The SNPs were then grouped into SNPLDB markers based on genomic block partition using the RTM-GWAS software, with haplotypes as their alleles and an LD threshold of D'>0.7 (He et al., 2017). A total of 15,501 SNPLDBs were identified in the NECSGP.

The RTM-GWAS procedure was used to dissect the genetic constitution underlying the SPC variation in the NECSGP, in which the genetic similarity coefficients (GSC) between accessions were calculated based on genome-wide SNPLDBs. The top 10 eigenvectors of the GSC matrix were used as the covariates to correct the population structure bias. A threshold of 0.05 was used at the first stage of RTM-GWAS for candidate marker preselection, and the significance level was set to 0.01 for stepwise regression at the second stage of RTM-GWAS. The detected QTLs (associated SNPLDBs) with their allele effects for each accession were used to establish an SPC QTL-allele matrix of the NECSGP for further analysis (He et al., 2017). Compared to the QTLs reported in SoyBase (https://soybase.org), a QTL was considered overlapped if its physical position was located in the same region as that in the SoyBase.

Transgressive Potential Prediction and Optimal Cross Design in the NECSGP

Based on the SPC QTL-allele matrix, all possible 64,980 single crosses (361 × 360/2) were generated in silico (He et al., 2017). Both linkage and independent models were used to analyze the recombination potential of SPC in the NECSGP. In the linkage model, the number of crossovers on each chromosome was simulated randomly according to the Poisson distribution with chromosome length as a parameter, while in the independent model, all genetic loci were considered independent of each other. The predicted genotypic SPC value was calculated for each cross based on 2,000 homozygous progenies derived from F₂ individuals through continuous selfing. The 95th percentile value was used as the predicted value for the recombination potential of each cross. The cross program (https://gitee.com/njau-sri/cross) was used for simulation. Based on the recombination potential analysis of individual crosses, the transgressive potential was predicted for crosses within an MG and crosses between MGs. The highest SPC of accessions observed in the MG(s) was used as a check to indicate the transgressive potential of a cross.

Candidate Gene Prediction

The steps of candidate gene prediction were as follows: (1) the genomic interval of a detected QTL (SNPLDB) was extended by 200 kb at both ends according to the LD decay distance in cultivated soybean populations; (2) within the genomic interval, the genes of the reference genome Wm82.a1.v1.1 were retrieved from SoyBase (https://soybase.org); and (3) the independence between a QTL and gene(s) within the QTL interval was tested using Chi-square criterion at a significance level of 0.05. The Gene Ontology (GO) annotations of genes were retrieved from SoyBase.

Results

Features of SPC Variation in the NECSGP

The joint analysis of variance (ANOVA) over two environments indicated significant SPC variation among the genotypes (accessions) and the genotype-by-environment interactions (Supplementary Table 1). The SPC of the NECSGP in Tieling ranged from 36.60 to 46.07%, with an average SPC of 40.77%. The heritability of SPC over two environments was estimated as 83.05%, with the GCV of 3.43% and the genotype-by-environment interaction (GEI) heritability of 11.56%, indicating the phenotypic SPC variation in the NECSGP was mainly caused by genotypic variation and affected slightly by GEI (Table 1). The SPC in NECSGP varied greatly but was not as wide as that in the Chinese soybean landrace population, where the SPC variation range was 37.51 to 50.46%, with an average of 43.10% (Zhang et al., 2018).

TABLE 1

Table 1. Frequency distribution and descriptive statistics for SPC in the NECSGP.

The results also showed that the difference in SPC among MGs was significant but relatively not large. The average SPC ranged from 40.32 to 41.58% among different MGs (Table 1). There was a slight increase in average SPC from late MGs (III + II + I) to early MGs (000 + 00 + 0) or from longer growth period to shorter growth period. MG II and I exhibited the lowest SPC while MG 000 exhibited the highest SPC. This trend implied that the SPC might retain at least a similar level of the NECSGP in breeding earlier maturing soybean varieties further northward. In this case, figuring out whether there is further SPC improvement potential depends on exploring the genetic recombination potential based on a relatively thorough exploration of the QTL-allele/gene-allele constitution of the NECSGP.

Identification of the SPC QTL-Allele System in the NECSGP

The RTM-GWAS with QTL-by-environment interaction (QEI) model was used to identify the SPC QTL-allele constitution since GEI was significant in ANOVA. A total of 15,501 SNPLDBs were constructed based on 82,966 SNPs. There were 8,780 SNPLDBs containing only a single SNP (S.SNPLDB) and 6,721 SNPLDBs containing multiple SNPs (M.SNPLDB). The number of alleles for M.SNPLDB ranged from 2 to 10 with an average of 3.5, while 1,792 M.SNPLDBs had only two alleles. At the first stage of RTM-GWAS under the single-locus model, 9,078 SNPLDBs were preselected from a total of 15,501 SNPLDBs. At the second stage in stepwise regression under the multiple-locus model, out of the preselected SNPLDBs, a total of 73 with 273 haplotypes/alleles passed the model test and were detected to be associated with SPC (Figures 1A,B). Among the 73 QTLs, 36 QTLs had the main effect only, 12 QTLs had only QEI effect, and 25 QTLs had both the main and QEI effect (Table 2). The 73 QTLs accounted for 71.70% of the phenotypic variation (PV). The 61 main effect QTLs with 240 alleles explained 62.72% PV and the 37 QEI QTLs with 138 alleles explained 8.98% PV. As indicated in Figure 1C, the phenotypic contribution of the main effect of QTLs varied continuously. When 1% PV was used as an artificial threshold for QTL classification, 61 QTLs could be classified as 25 large contribution QTLs (LC, R² ≥ 1%) with 105 alleles and 36 small contribution QTLs (SC, R² < 1%) with 135 alleles (Table 2). In the same way, all the QEI QTLs were classified into 37 SCs with 138 alleles, and there were no LCs.

FIGURE 1

Figure 1. The SPC QTL-allele information of the Northeast China soybean germplasm population obtained from RTM-GWAS. (A) Manhattan plot; (B) Quantile - quantile plot; (C) The phenotypic contribution of the 61 main-effect QTLs, blue bars denote small-contribution QTL (R² < 1%), red bars represent large-contribution QTL (R² ≥ 1%); (D) The frequency distribution of allele number per locus for the 61 main-effect QTLs; (E) SPC allele effects of the 61 main-effect QTLs; (F) QTL-matrix of SPC in the NECSGP; (G) Predicted SPC of progenies in possible crosses; (H) Gene Ontology (GO) biological process annotations of the candidate genes for SPC QTLs in the NECSGP. “other” GO biological processes include snoRNA localization, localization, anatomical structure development, post-embryonic development, multicellular organism development, activation of protein kinase activity, Golgi organization, chloroplast organization, macromolecule methylation, and methylation.

TABLE 2

Table 2. QTLs/SNPLDBs associated with SPC in the NECSGP.

The main effect QTLs are located on all chromosomes except Chr. 5, seven main effect QTLs on Chr. 3, six on Chr. 4 and 17, and one on Chr. 2, 11, and 13, respectively. The number of alleles for each main effect QTL ranged from 2 to 10 (Figure 1D) with allele effects ranging from −1.89 to 1.88 (Figure 1E; Supplementary Table 2). Compared to the previously reported SPC QTLs, 45 out of the 73 detected QTLs overlapped with those reported in the SoyBase (http://soybase.org), including the two QTL hotspots on Chr. 9 and 20. The remaining 28 QTLs were newly found in the present study (Supplementary Table 3). The 61 SPC main effect QTLs and their allele effects for each of the 361 accessions were organized as a QTL-allele matrix (Figure 1F), a compact form of the genetic constitution of the NECSGP. At the same time, the QTL-allele matrix can be further separated into submatrices corresponding to the six MGs (Figure 1F). The QEI QTL-allele data set can also be organized into a matrix if it is needed. But the environmental factor in the present study varied randomly, and no fixed environmental parameter was available to provide useful information in breeding for SPC improvement. Therefore, the QEI information was not used in further analysis.

SPC QTL-Allele Changes in the Evolution From Late to Early MGs in NECSGP

The above results indicated that the SPC in NECSGP slightly increased with the development of earlier soybean MGs due to the further northward dissemination after its introduction into the Liao-River valleys. During this artificial evolutionary process, the QTLs-alleles also changed. Some original alleles were passed down, some new ones emerged, and some old ones were excluded. New recombinants were formed, as indicated in Figure 1F. To analyze the QTL-allele changes from MG III + II + I to earlier MGs, the dynamic QTL-allele data were listed in the upper part of Table 3 and the summary statistics in the lower part. All the detected main effect and QEI effect data of the 73 loci with their 273 alleles were included since all are involved in the evolutionary process.

TABLE 3

Table 3. The SPC QTL-allele changes among maturity groups.

In comparison to the old MGs (MG I~III), only one allele of q-Prot-6-1 (a4) emerged in all the three new MGs (MG 0~000) and only two alleles of q-Prot-4-1 (a6) and q-Prot-18-2 (a2) were excluded in all the three new MGs (Table 3 upper part). There were different patterns of allele changes during the artificial evolutionary process from the old MGs to each of the new MGs. From the old MGs to the new MG 0, five alleles (one negative and four positives) of five QTLs were excluded and six alleles (two negatives and four positives) of six QTLs emerged. The number of emerged alleles was much less than that of excluded alleles from the old MGs to the new MG 00 and 000, that is, 50 alleles (24 negative and 26 positive) of 35 QTLs were excluded and five alleles (two negatives and three positives) of five QTLs emerged from the old MGs to the new MG 00, and 88 alleles (47 negatives and 41 positives) of 46 QTLs were excluded and one allele (positive effect) emerged from the old MGs to MG 000. With the shortening of the growth period, the number of excluded alleles increased and the number of emerged new alleles decreased.

Due to limited sample sizes, there might be some fluctuation in new MGs. Thus, only the comparison was made between the old MGs (III + II + I) and emerging MGs (0 + 00 + 000). There were 267 (142 negatives and 125 positives) alleles in the old MGs, of which 265 (142 negatives and 123 positives) alleles were inherited in the new MGs. Or in other words, 97.79% (265/271) alleles in new MGs were inherited from the old MGs, while six (2.21%) alleles (two negatives and four positives) emerged and two (2/267=0.75%) alleles of positive effect were excluded. Thus, the most alleles of the SPC QTLs in the old MGs were reserved in the new MGs, with only eight alleles changed. These changes in alleles caused an increase in SPC from 40.32–40.97% in the old MGs to 40.93–41.58% in the new MGs. The four alleles of the positive effect that emerged were responsible for the SPC increase as no alleles of negative effect were excluded. Accordingly, the evolutionary motivation of the slight increase in SPC of the new MGs compared to the old MGs might be due to the emergence of new alleles and possible recombination between inherited alleles rather than the exclusion of alleles. Thus, the following text will focus on the recombination or transgressive potential of the NECSGP.

Prediction of Allele Recombination Potential for Optimal Cross Design in the NECSGP

The genotypes of 2,000 homozygous progenies were simulated for each of the 64,980 possible crosses among the 361 soybean accessions in the NECSGP, then the SPC of the progenies was predicted based on the SPC QTL-allele matrix in the population. In this study, as the linkage and independent model results were very similar, only the simulation results of the linkage model were used to explore the allele recombination potential. For each cross, the SPC percentile of the progeny population was used as an indicator of recombination potential between alleles. As shown in Figure 1G, transgressive recombination for SPC existed in the NECSGP. Using the 95th percentile, the predicted SPC of the 64,980 crosses ranged from 37.84 to 50.00%, with an average of 43.29%, and 1,803 crosses showed higher SPC than the maximum SPC (46.07%) in the NECSGP (Table 4). Transgressive recombination for SPC was observed both for crosses within and between MG(s). Using the 95th percentile, 534 crosses within MGs and 1,269 crosses between MGs showed higher SPC than the maximum SPC in the NECSGP. The average SPC of predicted crosses within and between maturity groups were similar, but the maximum SPC between MGs was higher than that within MGs (Table 4).

TABLE 4

Table 4. The predicted SPC of simple crosses within and between maturity groups.

For crosses within MGs, 171, 318, 22, and 23 crosses within MG I + II + III, 0, 00, and 000, respectively, showed higher SPC than the maximum SPC in the NECSGP. The predicted SPC for each group was similar, with the maximum SPC ranging from 48.01 to 48.91%. For crosses between MGs, the predicted SPC varied, with the maximum SPC ranging from 48.51 to 50.00%. The crosses between MG 0 and 000 exhibited the maximum recombination potential, and the SPC of crosses between new MGs was slightly higher than that between old MGs and new MGs (Table 4).

The above results indicated allele recombination potential for SPC improvement in terms of the 95th percentile at the NECSGP level. The average recombination potential for SPC improvement was estimated as 2.52% (=43.29–40.77), with the maximum recombination potential as 9.23% (=50.00–40.77) and the maximum transgressive potential as 3.93% (=50.00–46.07). From the individual MG level, the above three comparisons varied similarly. For example, in MG 0, the mean recombination potential was estimated as 2.45% (=43.38–40.93) in MG 0, with the maximum recombination potential as 7.98% (=48.91–40.93) and the maximum transgressive potential as 4.20% (=48.91–44.71). Thus, there were superior recombination and transgressive potential within/among the MGs in the NECSGP. The potential for SPC improvement exists in the population and remains to be explored according to the SPC QTL-allele constitution of the NECSGP.

The five best crosses were selected for each MG and the entire NECSGP (Table 5). The cross between L54 (MG 000) and L5 (MG 0) exhibited the highest 95th percentile of the predicted SPC (50.00%), with an 8.53% increase in SPC compared with the maximum SPC in the NECSGP. Although the recombination potential was relatively limited within MG, it may also reach up to 50% under intensive selection, as indicated by the 99th percentile. For example, the 99th percentile of predicted SPC of the cross L329 × L5 was 50.04%, and that of L54 × L5 was as high as 51.54%. Thus, according to the QTL-allele matrix, the optimal or best crosses can be designed readily.

TABLE 5

Table 5. Optimal crosses for high SPC in different maturity groups (%).

Annotation of Candidate Gene System of SPC in the NECSGP

Using the chi-square test, a total of 190 genes were significantly associated with 44 SPC QTLs in this study, and then 120 candidate genes on 34 SPC QTLs were annotated and functionally classified into 13 GO biological process categories, including transporter activity, translation, regulation of the biological process, metabolic process, transcription, phosphorylation, catabolic process, cellular process, response to stimulus, signaling, biosynthetic process, reproductive process, and others (Figure 1H). These candidate genes involved 34 SPC QTLs, explaining 41.35% of the PV (Supplementary Table 4). Among the candidate genes, four are involved directly in protein or amino acid synthesis and metabolism, according to the annotation information. The Glyma03g33360 gene on q-Prot-3-5 is involved in the histidine biosynthetic process. In NECSGP, six SNPs related to this gene were found, among which three SNPs were located within the gene and three SNPs were located in the 5 kb upstream and downstream of the gene. Significant differences in SPC were observed among the five haplotypes on this gene locus. The haplotype “AACTTC” had the highest frequency but lowest mean SPC in the NECSGP (Supplementary Figure 1A). The Glyma15g10780 gene on q-Prot-15-2 was involved in the S-adenosylmethioninamine biosynthetic process and the Glyma16g29760 gene on q-Prot-16-3 was involved in the peptidyl-pyroglutamic acid biosynthetic process. In these two loci, each contained only one SNP in NECSGP, and no significant associations between the SNP and SPC were observed (Supplementary Figures 1B,C). The Glyma17g35490 gene on q-Prot-17-6 involved in proteolysis, and its homologous gene in Arabidopsis thaliana, AT5G67360, belongs to the subtilase family protein, encoding a subtilisin-like serine protease essential for mucilage release from seed coats. The Glyma17g35490 gene locus had seven haplotypes in NECSGP, and there were significant differences in SPC among haplotypes. The haplotype “GACTA” had the highest mean SPC while “GCACA” had the highest frequency in NECSGP (Supplementary Figure 1D). The above candidate gene information was cited and inferred from the SoyBase (http://soybase.org), and the biological functions of the candidate genes are to be studied and confirmed further. This information implied that SPC is a complex trait conferred by a gene network involving a series of functional genes.

Discussion

Genetic Potential and Optimal Cross Design of SPC in the NECSGP

The SPC in NECSGP varied greatly but was not as wide as that in the Chinese soybean landrace population. A slightly significant increase was observed from the old MGs to the new MGs. Using RTM-GWAS, 61 main-effect SPC QTLs with 240 alleles were detected, explaining 62.72% of the phenotypic variation. Based on the SPC QTL matrix, the predicted 95th percentile of SPC in progenies of possible crosses showed that the mean recombination potential was estimated as 43.29 or 2.52% more than the population mean of 40.77%. The maximum recombination potential was 50.00 or 9.23% more than the population means, and the maximum transgressive potential was 50.00 or 3.93% more than the best accession in the population. Thus, there was large genetic potential in improving SPC even though the phenotypic variation was not large in the population, and the genetic potential was mainly due to allele recombination in the population. Since both the linkage model and independent models had similar estimates in the prediction of recombination potential, there was no need to break linkage drags to improve SPC in the NECSGP. This result might apply to the soybeans in the Americas because the germplasm in the Americas was mainly introduced from the NECSGP. Of course, in addition to utilizing the recombination potential in the NECSGP itself, there should be more potential for a breakthrough in the improvement of SPC, if elite SPC germplasm is introduced to the NECSGP from external genetic resources.

Based on the above estimation of genetic recombination potential in the NECSGP, the optimal crosses were selected for breeding purposes. In other words, the present study has provided an optimal cross design procedure for SPC improvement, including the following steps: the establishment of a QTL-allele matrix based on RTM-GWAS, then simulation of the possible crosses done in silico for their breeding values of certain (95th for example) percentile homogeneous progenies, and finally choosing the best crosses according to the predicted breeding values. In this way, the best crosses or the best parental combinations are designed. Compared to the traditional breeding, this optimal cross design procedure covers all possible crosses in the population based on the establishment of a whole-genome QTL-allele matrix and is effective and efficient in predicting best crosses and progenies, realizing transformation from phenotype selection to genomic selection and shortening the breeding cycles. In addition, among the present possible crosses, 1,803 transgressive combinations were detected, in which the predicted best cross was L54 × L59 with SPC 50.00% in its 95th percentile progeny. Of the 73 SPC QTLs in these two parents, 42 had the same alleles and 31 had different alleles. Both parents had complementary large positive and negative effect alleles. L54 had one favorable allele (1.84%) on q-Prot-13-1 and one allele with a large negative effect (-1.30%) on q-Prot-4-6, while L5 had five favorable alleles (0.79-1.88%) on q-Prot-1-4, q-Prot-4-6, q-Prot-9-2, q-Prot-17-5, q-Prot-18-3, and three alleles of a large negative effect (-0.97-1.84%) on q-Prot-6-4, q-Prot-13-1, and q-Prot-18-1, respectively. This example explained why L54 × L59 was the best-predicted cross and why the NECSGP has potential for SPC improvement through genetic recombination.

The above optimal cross prediction procedure is, in fact, a genome-wide sequencing marker-assisted prediction. Our previous marker-assisted selection for transgressive SPC in recombinant inbred line (RIL) populations was very effective (Zhang et al., 2015a). Two transgressive segregants for SPC with SPC of 49.33% and 46.32% were selected from two RIL populations with their parental SPC of 44.83, 44.83, 35.35, and 44.34%, respectively, and then were crossed for further improvement of SPC. The two transgressive segregants and the derived offspring were genotyped at three major SPC QTLs, and the recombinants with all three alleles of positive effect performed the highest SPC in F₂-derived families, especially in the F_2:5:6 generation where a progeny with the highest SPC of 54.15% was obtained. This example demonstrated the effectiveness of the marker-assisted selection procedure in breeding for SPC. Thus, this should also apply to the above predicted optimal cross L54 × L59; especially, it was based on whole-genome sequencing marker-assisted prediction, while Zhang et al.'s example was based only on some SSR markers.

In the present study, SPC was the primary focus, but modern breeders have been pursuing high-yield, high SPC, and high oil content soybean cultivars (Patil et al., 2017). Previous studies have found that soybean protein content was negatively correlated with oil content and yield (Chaudhary et al., 2015). High protein content often leads to a decrease in oil content and yield. In breeding soybean cultivars with high protein content, high oil content, and high yield, balancing the relationship among the three traits has always been an urgent problem to be solved. In the present study, it is suggested to establish the QTL-allele matrices for all the three traits, on which the optimal crosses for combining all elite QTL-alleles of the three traits might be predicted. Therefore, optimal cross prediction for multiple traits should be further explored.

The SPC QTL-Allele Structure and Evolutionary Mechanism in the NECSGP

In the NECSGP, 73 SPC QTLs/SNPLDBs with 273 alleles were detected, accounting for 71.70% PV, in which 61 main-effect QTLs with 240 alleles accounted for 62.72% PV. Compared to the QTL reported in the literature and SoyBase (https://soybase.org), 45 QTLs overlapped with the reported QTLs, and 28 QTLs were newly found, explaining 23.85% PV. The SNPLDB markers also satisfied the requirements of the presence of multiple alleles in natural populations. The QTL of the largest contribution was q-Prot-4-6, which explained 2.32% PV. Compared to previous studies (Bandillo et al., 2015; Sonah et al., 2015; Zhang et al., 2017), QTLs with relatively small effects could also be detected for SPC in this study using the RTM-GWAS method; in other words, the SPC QTLs with their alleles can be fully explored. By taking the trait heritability as the upper PV limit, both the false positive and false negative problems can be controlled in the RTM-GWAS method. The detection power was further boosted with the two-stage analysis strategy and the multi-locus model. Thus, the relatively thorough detection of the SPC QTL-allele system in the NECSGP can facilitate the study of genetic dynamics of SPC variation.

The SPC QTL-allele structure changed from the old MGs to the new MGs, with both emerged alleles and excluded alleles, but allele changes in SPC were not as many as those in days to flowering (Fu et al., 2020b; Liu et al., 2021), main stem node number (Fu et al., 2020a; Fahim et al., 2021), and other traits (Meng et al., 2016). Therefore, SPC is a trait not sensitive to allele changes, which may be one reason why SPC cannot be improved readily. However, among the four evolutionary motivators of allele inheritance, emergence, exclusion, and recombination, the allele contributions for the first three factors were 97.79%, 2.21%, and 0.75%, respectively. The allele emergence and allele exclusion were relatively weak in SPC. The fourth factor, allele recombination, was relatively strong as indicated in the prediction of recombination potential. Thus, for a breakthrough in improving SPC in the NECSGP, introducing superior alleles from other germplasm populations may be a potential strategy for SPC breeding in NEC.

Furthermore, 34 out of the 73 QTLs (SNPLDBs) had only two alleles, of which 31 QTLs were SNPLDBs containing only a single SNP (S.SNPLDB). Previous studies showed that along with the increase in the number of SNPs or sequencing depth, the S.SNPLDBs would likely be merged into LD blocks with multiple SNPs (He et al., 2017). Since the detected SNP number of the soybean genome in this study was relatively small, the exploration of SPC QTL-allele in the NECSGP may be further improved with sequencing depth increased.

Conclusion

The SPC in NECSGP varied greatly but was not as high as in the Chinese soybean landrace population. There was a slight SPC increase from the old MGs (III + II + I) to the new MGs (0 + 00 + 000). The 71.70% SPC variation in NECSGP can be explained by 73 SPC QTLs with 273 alleles, including 28 newly identified QTLs. The evolutionary changes of QTL-allele structure from old MGs to new MGs showed most alleles in new MGs were inherited from the old MGs, and only a small number of alleles emerged or were excluded. The small amount of new positive allele emergence and possible allele recombination between alleles explained the slight SPC increase in new MGs. The prediction results of 95th percentile progenies of possible crosses showed recombination and transgressive potential, indicating that SPC breeding potentials exist in NECSGP. Candidate gene analysis indicated that SPC is a complex trait conferred by a gene network involving a series of functional genes.

Data Availability Statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://github.com/njau-sri/NECSGP-SPC, NECSGP-SPC.

Author Contributions

JG designed the experiments. LF, MF, YW, HR, and WD performed the field experiments. WF, ZS, and JH analyzed the data and interpreted the results. XH and JZ participated in data analysis. LS, LW, WW, and GX participated in field experiments. WF, JH, ZS, and JG drafted the manuscript. All authors approved the manuscript.

Funding

This work was supported by the National Key Research Development Program of China (2021YFF1001204), the Program of Jiangsu Province (JBGS-2021-014), the MOE 111 Project (B08025), the MOE Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT_17R55), the Fundamental Research Funds for the Central Universities (KYZZ201901), the MARA CARS-04 Program, the Primary Research and Development Plan of Jiangsu Province (BE2021358), the Jiangsu JCIC-MCP, the Guidance Foundation of Sanya Institute of Nanjing Agricultural University (NAUSY-ZZ02 and NAUSY-MS05), and the Bioinformatics Center of Nanjing Agricultural University.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2022.896549/full#supplementary-material

References

Bandillo, N., Jarquin, D., Song, Q., Nelson, R., Cregan, P., Specht, J., et al. (2015). A population structure and genome-wide association analysis on the USDA soybean germplasm collection. Plant Genome. 8, 3. doi: 10.3835/plantgenome2015.04.0024

PubMed Abstract | CrossRef Full Text | Google Scholar

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29, 1165–1188. doi: 10.1214/aos/1013699998

PubMed Abstract | CrossRef Full Text | Google Scholar

Chaudhary, J., Patil, G. B., Sonah, H., Deshmukh, R. K., Vuong, T. D., Valliyodan, B., et al. (2015). Expanding omics resources for improvement of soybean seed composition traits. Front. Plant Sci. 6, 1021. doi: 10.3389/fpls.2015.01021

PubMed Abstract | CrossRef Full Text | Google Scholar

Fahim, A. M., Liu, F., He, J., Wang, W., Xing, G., and Gai, J. (2021). Evolutionary QTL-allele changes in main stem node number among geographic and seasonal subpopulations of Chinese cultivated soybeans. Mol. Genet. Genom. 296, 313–330. doi: 10.1007/s00438-020-01748-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Fliege, C. E., Ward, R. A., Vogel, P., Nguyen, H., Quach, T., Guo, M., et al. (2022). Fine mapping and cloning of the major seed protein quantitative trait loci on soybean chromosome 20. Plant J. 110, 114–128. doi: 10.1111/tpj.15658

PubMed Abstract | CrossRef Full Text | Google Scholar

Fu, M., Wang, Y., Ren, H., Du, W., Wang, D., Bao, R., et al. (2020b). Genetic dynamics of earlier maturity group emergence in south-to-north extension of Northeast China soybeans. Theor. Appl. Genet. 133, 1839–1857. doi: 10.1007/s00122-020-03558-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Fu, M., Wang, Y., Ren, H., Du, W., Yang, X., Wang, D., et al. (2020a). Exploring the QTL-allele constitution of main stem node number and its differentiation among maturity groups in a Northeast China soybean population. Crop Sci. 60, 1223–1238. doi: 10.1002/csc2.20024

CrossRef Full Text | Google Scholar

He, J., and Gai, J. (2020). QTL-allele matrix detected from RTM-GWAS is a powerful tool for studies in genetics, evolution, and breeding by design of crops. J. Integr. Agric. 19, 1407–1410. doi: 10.1016/S2095-3119(20)63199-9

CrossRef Full Text | Google Scholar

He, J., Meng, S., Zhao, T., Xing, G., Yang, S., Li, Y., et al. (2017). An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding. Theor. Appl. Genet. 130, 2327–2343. doi: 10.1007/s00122-017-2962-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Hwang, E. Y., Song, Q., Jia, G., Specht, J. E., Hyten, D. L., Costa, J., et al. (2014). A genome-wide association study of seed protein and oil content in soybean. BMC Genom. 15, 1. doi: 10.1186/1471-2164-15-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Karikari, B., Li, S., Bhat, J. A., Cao, Y., Kong, J., Yang, J., et al. (2019). Genome-wide detection of major and epistatic effect QTLs for seed protein and oil content in soybean under multiple environments using high-density bin map. Int. J. Mol. Sci. 20, 979. doi: 10.3390/ijms20040979

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, R., Yu, C., Li, Y., Lam, T. W., Yiu, S. M., Kristiansen, K., et al. (2009). SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 25, 1966–1967. doi: 10.1093/bioinformatics/btp336

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, S., Cao, Y., He, J., Wang, W., Xing, G., Yang, J., et al. (2020). Genetic dissection of protein content in a nested association mapping population of soybean. Sci. Agric. Sin. 53, 1743–1755. (in Chinese). doi: 10.3864/j.issn.0578-1752.2020.09.005

CrossRef Full Text | Google Scholar

Li, S., Xu, H., Yang, J., and Zhao, T. (2019). Dissecting the genetic architecture of seed protein and oil content in soybean from the yangtze and huaihe river valleys using multi-locus genome-wide association studies. Int. J. Mol. Sci. 20, 3041. doi: 10.3390/ijms20123041

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, X., He, J., Wang, Y., Xing, G., Li, Y., Yang, S., et al. (2020). Geographic differentiation and phylogeographic relationships among world soybean populations. Crop J. 8, 260–272. doi: 10.1016/j.cj.2019.09.010

CrossRef Full Text | Google Scholar

Liu, X., Li, C., Cao, J., Zhang, X., Wang, C., He, J., et al. (2021). Growth period QTL-allele constitution of global soybeans and its differential evolution changes in geographic adaptation versus maturity group extension. Plant J. 108, 1624–1643. doi: 10.1111/tpj.15531

PubMed Abstract | CrossRef Full Text | Google Scholar

Marsh, J. I., Hu, H., Petereit, J., Bayer, P. E., Valliyodan, B., Batley, J., et al. (2022). Haplotype mapping uncovers unexplored variation in wild and domesticated soybean at the major protein locus cqProt-003. Theoretic. Appl. Genetic. 135, 1443–1455. doi: 10.1007/s00122-022-04045-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Meng, S., He, J., Zhao, T., Xing, G., Li, Y., Yang, S., et al. (2016). Detecting the QTL-allele system of seed isoflavone content in Chinese soybean landrace population for optimal cross design and gene system exploration. Theor. Appl. Genet. 129, 1557–1576. doi: 10.1007/s00122-016-2724-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Miller, M. R., Dunham, J. P., Amores, A., Cresko, W. A., and Johnson, E. A. (2007). Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 17, 240–248. doi: 10.1101/gr.5681207

PubMed Abstract | CrossRef Full Text | Google Scholar

Nachman, M. W. (2001). Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17, 481–485. doi: 10.1016/S0168-9525(01)02409-X

PubMed Abstract | CrossRef Full Text | Google Scholar

Patil, G., Mian, R., Vuong, T., Pantalone, V., Song, Q., Chen, P., et al. (2017). Molecular mapping and genomics of soybean seed protein: a review and perspective for the future. Theor. Appl. Genet. 130, 1975–1991. doi: 10.1007/s00122-017-2955-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Scheet, P., and Stephens, M. (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644. doi: 10.1086/502802

PubMed Abstract | CrossRef Full Text | Google Scholar

Schmutz, J., Cannon, S. B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., et al. (2010). Genome sequence of the palaeopolyploid soybean. Nature. 463, 178–183. doi: 10.1038/nature08670

PubMed Abstract | CrossRef Full Text | Google Scholar

Sonah, H., O'Donoughue, L., Cober, E., Rajcan, I., and Belzile, F. (2015). Identification of loci governing eight agronomic traits using a GBS-GWAS approach and validation by QTL mapping in soya bean. Plant Biotechnol. J. 13, 211–221. doi: 10.1111/pbi.12249

PubMed Abstract | CrossRef Full Text | Google Scholar

Sul, J. H., Martin, L. S., and Eskin, E. (2018). Population structure in genetic studies: confounding factors and mixed models. PLoS Genet. 14, e1007309. doi: 10.1371/journal.pgen.1007309

PubMed Abstract | CrossRef Full Text | Google Scholar

Tam, V., Patel, N., Turcotte, M., Bosse, Y., Pare, G., and Meyre, D. (2019). Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484. doi: 10.1038/s41576-019-0127-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Teng, W., Li, W., Zhang, Q., Wu, D., Zhao, X., Li, H., et al. (2017). Identification of quantitative trait loci underlying seed protein content of soybean including main, epistatic, and QTL × environment effects in different regions of Northeast China. Genome. 60, 649–655. doi: 10.1139/gen-2016-0189

PubMed Abstract | CrossRef Full Text | Google Scholar

Warrington, C. V., Abdel-Haleem, H., Hyten, D. L., Cregan, P. B., Orf, J. H., Killam, A. S., et al. (2015). QTL for seed protein and amino acids in the Benning × Danbaekkong soybean population. Theor. Appl. Genet. 128, 839–850. doi: 10.1007/s00122-015-2474-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, J., Ferreira, T., Morris, A. P., Medland, S. E., and Genetic Investigation of ANthropometric Traits (GIANT) Consortium DIAbetes Genetics Replication Meta-analysis (DIAGRAM) Consortium . (2012). Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375. doi: 10.1038/ng.2213

PubMed Abstract | CrossRef Full Text | Google Scholar

Yi, X., Liang, Y., Huerta-Sanchez, E., Jin, X., Cuo, Z. X., Pool, J. E., et al. (2010). Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 329, 75–78. doi: 10.1126/science.1190371

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, D., Lü, H., Chu, S., Zhang, H., Zhang, H., Yang, Y., et al. (2017). The genetic architecture of water-soluble protein content and its genetic relationship to total protein content in soybean. Sci. Rep. 7, 5053. doi: 10.1038/s41598-017-04685-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y., He, J., Shan, M., Liu, M., Xing, G., Li, Y., et al. (2018). Identifying QTL–allele system of seed protein content in Chinese soybean landraces for population differentiation studies and optimal cross predictions. Euphytica. 214, 157. doi: 10.1007/s10681-018-2235-y

CrossRef Full Text | Google Scholar

Zhang, Y., He, J., Wang, Y., Xing, G., Zhao, J., Li, Y., et al. (2015b). Establishment of a 100-seed weight quantitative trait locus-allele matrix of the germplasm population for optimal recombination design in soybean breeding programmes. J. Exp. Bot. 66, 6311–6325. doi: 10.1093/jxb/erv342

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y. H., Liu, M. F., He, J. B., Wang, Y. F., Xing, G. N., Li, Y., et al. (2015a). Marker-assisted breeding for transgressive seed protein content in soybean [Glycine max (L.) Merr]. Theor. Appl. Genet. 128, 1061–1072. doi: 10.1007/s00122-015-2490-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: Northeast China soybean germplasm population (NECSGP), seed protein content (SPC), restricted two stage multi-locus model GWAS (RTM-GWAS), QTL-allele matrix, optimal cross prediction, transgressive potential

Citation: Feng W, Fu L, Fu M, Sang Z, Wang Y, Wang L, Ren H, Du W, Hao X, Sun L, Zhang J, Wang W, Xing G, He J and Gai J (2022) Transgressive Potential Prediction and Optimal Cross Design of Seed Protein Content in the Northeast China Soybean Population Based on Full Exploration of the QTL-Allele System. Front. Plant Sci. 13:896549. doi: 10.3389/fpls.2022.896549

Received: 15 March 2022; Accepted: 09 June 2022;
Published: 12 July 2022.

Edited by:

Guo-Liang Jiang, Virginia State University, United States

Reviewed by:

Qijian Song, Agricultural Research Service, USDA, United States
Dawei Xin, Northeast Agricultural University, China

Copyright © 2022 Feng, Fu, Fu, Sang, Wang, Wang, Ren, Du, Hao, Sun, Zhang, Wang, Xing, He and Gai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jianbo He, aGpieHl6QGdtYWlsLmNvbQ==; Junyi Gai, c3JpQG5qYXUuZWR1LmNu

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.