Skip to main content

ORIGINAL RESEARCH article

Front. Plant Sci., 28 June 2023
Sec. Functional and Applied Plant Genomics
This article is part of the Research Topic Genetics, Genomics and Breeding of Plant Architecture, Biomass, Grain Quality and Grain Yield Traits in Rice and Wheat View all 15 articles

Association mapping reveals novel genes and genomic regions controlling grain size architecture in mini core accessions of Indian National Genebank wheat germplasm collection

  • 1ICAR-National Bureau of Plant Genetic Resources, New Delhi, India
  • 2Jaypee University of Information Technology, Solan, India
  • 3ICAR-National Bureau of Plant Genetic Resources, Regional Station, Jodhpur, Jodhpur, India
  • 4Zonal Agricultural Research Station, Powarkheda, India

Wheat (Triticum aestivum L.) is a staple food crop for the global human population, and thus wheat breeders are consistently working to enhance its yield worldwide. In this study, we utilized a sub-set of Indian wheat mini core germplasm to underpin the genetic architecture for seed shape-associated traits. The wheat mini core subset (125 accessions) was genotyped using 35K SNP array and evaluated for grain shape traits such as grain length (GL), grain width (GW), grain length, width ratio (GLWR), and thousand grain weight (TGW) across the seven different environments (E1, E2, E3, E4, E5, E5, E6, and E7). Marker-trait associations were determined using a multi-locus random-SNP-effect Mixed Linear Model (mrMLM) program. A total of 160 non-redundant quantitative trait nucleotides (QTNs) were identified for four grain shape traits using two or more GWAS models. Among these 160 QTNs, 27, 36, 38, and 35 QTNs were associated for GL, GW, GLWR, and TGW respectively while 24 QTNs were associated with more than one trait. Of these 160 QTNs, 73 were detected in two or more environments and were considered reliable QTLs for the respective traits. A total of 135 associated QTNs were annotated and located within the genes, including ABC transporter, Cytochrome450, Thioredoxin_M-type, and hypothetical proteins. Furthermore, the expression pattern of annotated QTNs demonstrated that only 122 were differentially expressed, suggesting these could potentially be related to seed development. The genomic regions/candidate genes for grain size traits identified in the present study represent valuable genomic resources that can potentially be utilized in the markers-assisted breeding programs to develop high-yielding varieties.

Introduction

Bread wheat (Triticum aestivum L.) is an important staple food crop, serving as the main source of energy, protein, and fiber for much of the world’s human population (Ling et al., 2013; Ji et al., 2022). However, with the current rate of yearly increment in wheat yield, feeding the ever-increasing world population, which is expected to reach 9.3 billion by 2050, will be a daunting task. Further, depletion of natural resources such as land and water and a rise in the mean earth surface temperature exacerbates the problem and poses a challenge to producing sufficient wheat to feed the human population in the future (Nehe et al., 2019). To increase wheat yield, it is important to understand the genetic basis of traits that contribute to grain yield (GY). Grain yield and its contributing traits are complex in nature, highly influenced by environmental conditions, and regulated by multiple genes (Kato et al., 2000; Li et al., 2019; Ji et al., 2022). Quantitative trait loci (QTL) associated with GY has been extensively studied and reported on all 21 wheat chromosomes (Bennett et al., 2012; Sun et al., 2020). Several studies have identified numerous QTL for GY and productivity (Bennett et al., 2012; Zegeye et al., 2014; Malik et al., 2021; Ji et al., 2022). However, there is very limited information on the marker-assisted improvement of GY traits in wheat. This is primarily due to the non-availability of tightly linked robust markers with the GY-associated traits. The conventional QTL mapping approach has been extensively used for gene mapping and has enabled genetic dissection of seed traits in wheat (Breseghello and Sorrells, 2007; Duan et al., 2020). However, this approach does not allow detection of all the possible allelic variants of the target gene that might exist in the natural populations of wheat. Another downside of the QTL mapping approach is its poor resolution. Availability of gold standard wheat reference genome sequence and high-density SNP arrays is expected to accelerate high-resolution mapping of complex traits using both conventional and association mapping approaches (IWGSC et al., 2018; Chaurasia et al., 2021).

In the past few years, the genome-wide association study (GWAS) has become a popular approach to identify the QTL associated with complex traits in crops. In contrast to QTL mapping, this approach enables the exploration of a large number of alleles for any locus from a natural population of diverse individuals. This approach facilitates high-resolution mapping of traits because the individuals used for the association analysis might have undergone several rounds of historical recombination (Yu and Buckler, 2006; Li et al., 2019). Several GWAS studies have been performed on major crops such as Oryza sativa (Spindel et al., 2016), Zea mays (Xu et al., 2017; Xu et al., 2018), Hordeum vulgare (Visioni et al., 2013), Avena sativa (Newell et al., 2011), Brassica napus (Zhou et al., 2017), Glycine max (Zhang et al., 2015), Sorghum bicolor (Morris et al., 2013), and in wheat for the genetic dissection of various desirable traits (Peng et al., 2018; Chaurasia et al., 2021; Malik et al., 2021). Over the past few years, efforts have been made to develop GWAS models that are more suited to investigating genetics of simple as well as complex traits in plants. These GWAS models are broadly grouped into single-locus GWAS (SL-GWAS) and multi-locus GWAS (ML-GWAS) methods. SL-GWAS methods have been widely used to detect genetic variants for various traits, but one main limitation of this model is that the p-values of the markers identified to be associated with the target trait need to be subjected to multiple rounds of testing to avoid false positive associations. To overcome this limitation, Zhang et al. (2020) developed a mrMLM package that contains the following six ML-GWAS methodologies: mrMLM (multilocus random-SNP-effect MLM) (Wang et al., 2016), FASTmrMLM (fast mrMLM) (Tamba et al., 2017), ISIS EM-BLASSO (iterative modified-sure independence screening expectation-maximization Bayesian least absolute shrinkage and selection operator) (Wen et al., 2017), pKWmEB (integration of Kruskal-Wallis test with empirical Bayes) (Ren et al., 2018), FASTmrEMMA (fast multi-locus random-SNP-effect efficient mixed model analysis) (Wen et al., 2017), and pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes) (Zhang et al., 2017).

Among the GY-associated traits, grain size contributes the most, making it a key selection target in wheat breeding programs for developing high yielding varieties. Thousand grain weight (TGW) is the main component of GY and is determined by grain size traits such as grain length (GL), grain width (GW), and grain length width ratio (GLWR) (Sun et al., 2009; Li et al., 2022). Thus, it is important to understand the genetic and molecular basis of the mechanisms governing grain size in wheat genotypes and to identify the superior novel alleles governing this trait from germplasm collections for exploitation in the breeding program. Therefore, the main aim of this study was to dissect the genetic control of grain size traits such as GL, GW, GLWR, and TGW in wheat germplasm employing 35K SNP array using multi-locus GWAS approaches.

Materials and methods

Experimental materials and phenotyping

The experimental material for the GWAS study comprised of 125 diverse wheat accessions, a subset of a mini core developed from the composite wheat core set (Phogat et al., 2020) of the National Genebank of India. These accessions were comprised of 85 indigenous and 40 exotic collections, which included released varieties, landraces, genetic stocks, and elite genotypes (Supplementary Table 1). These accessions were evaluated following augmented block design in five blocks using four checks (HD2967 and C-306) randomized in each block over five years (Rabi 2015-16 to Rabi 2019-20). The GWAS panel was evaluated at the ICAR-National Bureau of Plant Genetic Resources (NBPGR), Issapur Farm, Delhi located 28.3748° N, 77.0902° E, 228.6 m AMSL, for five consecutive years; during the fifth year, the trial was also taken up at the ICAR-NBPGR, Regional Station, Jodhpur located at 26.2389° N, 73.0243° E, 263 m AMSL) and the Zonal Agricultural Research Station, Powarkheda, Madhya Pradesh located at 22.4154° N, 77.4442° E, 229 m AMSL. In total, these made up seven environments: Delhi (2015-16)-E1, Delhi (2016-17)-E2, Delhi (2017-18)-E3, Delhi (2018-19)-E4, Delhi (2019-20)-E5, Powarkheda (2019-20)-E6, and Jodhpur (2019–20) - E7.

Each genotype was grown in a three-rows plot of 2 m length each, with a row-to-row distance of 0.25 m. Pests and diseases were controlled chemically, whereas weeds were controlled manually. The wheat GWAS panel was evaluated for GL (mm), GW (mm), GLWR, and TGW (gm) from the harvested grain samples. The measurements of these grain parameters were performed by selecting the main spike of five random individual plants in the middle of the row for each accession. Grains per spike were estimated by hand-threshing the mature spike. TGW of each genotype was recorded by weighing all the seeds from a sample, dividing it by the total seed number measured, and multiplying the result by 1000. For GL and GW analysis, ten seeds from each five spikes were measured using digital vernier caliper and average value of the plot accessions was taken up for analysis. Grain length/width ratio (GLWR) was calculated by dividing the grain length mean by the grain width mean for each genotype.

Phenotypic data analyses

The phenotypic data was analyzed using ACBD-R (Augmented Complete Block Design with R) version 4.0 software (Rodriguez et al., 2018). The mean, coefficient of variation (CV), least significant difference (LSD), genotypic variance, and heritability were estimated. In ACBD-R v4.0, the best linear unbiased predictors (BLUPs) of each genotype were calculated for each environment and across environments along with four checks varieties. The calculated BLUPs were then used in the GWAS analysis. The frequency distribution graphs, correlation coefficients of the recorded traits, and principal component analysis were obtained through SAS JMP Version 14 software (https://www.jmp.com/en_in/software/data-analysis-software.html).

Genomic DNA isolation and SNP genotyping

Genomic DNA was isolated from one-week-old wheat seedlings using the CTAB method (Murray and Thompson, 1980) with a few modifications and then treated with RNase to remove any RNA contamination. The integrity of DNA samples was checked on 0.8% agarose gel and concentration was determined by using a NanoDrop1000 (Thermo Scientific). Genotyping of isolated DNA samples was done using Breeder’s 35K Axiom® array (Allen et al., 2017). The SNPs with a genotyping call rate < 97% and minor allele frequency (MAF) <5% were removed while performing genomic data analysis.

Clustering, population structure, and linkage disequilibrium analysis

A total of 23,874 SNPs were used to perform principal component analysis (PCA) and generate kinship matrix using TASSEL 5.2 program (https://www.maizegenetics.net/tassel). STRUCTURE software was used to estimate the level of genetic differentiation in the population using the Bayesian model-based approach; the parameter burn-in period and Monte Carlo Markov Chain (MCMC) replication number were set to 10,000 and 20,000 respectively for ten independent runs to estimate the number of subpopulations (k) in a putative range of k = 1 to 5. The optimal subpopulation number was estimated using an ad hoc statistic delta k (Evanno et al., 2005). The squared allele frequency correlation (r2) between SNP markers was used to estimate linkage disequilibrium (LD) using TASSEL v5.2 (https://www.maizegenetics.net/tassel).

Genome-wide association analysis

All 125 accessions were genotyped using 35K SNPs array. We used five ML-GWAS methods which were included in the R package mrMLM v4.0.2 (https://cran.r-project.org/web/packages/mrMLM/index.html). These five models are mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISISEM-BLASSO. All the parameters were set at default values. The critical thresholds of significant association for all the five methods were set as logarithm of the odds (LOD) score ≥3.00. The most significant SNPs, detected in at least two methods, were considered as reliable SNPs.

Differential gene expression analysis

We utilized RNA-seq data of two wheat genotypes with contrasting seed size i.e., IC111905 (large-seeded)and EC575981 (small-seeded) at 15 days and 30 days post anthesis (DPA) with three biological replicates to check the expression profile of putative candidate genes located in the identified genomic regions. Illumina sequencing was performed, which generated approximately 177 Gb raw data. Approximately 98.5% of reads passed the quality control and clean reads were mapped back on to the reference genome (IWGSC v2.0) (https://plants.ensembl.org/Triticum_aestivum/Info/Index) using bwa-mem software (https://sourceforge.net/projects/bio-bwa/files/). The differential gene expression analysis was performed using edge R package and genes with p-value <0.05 were considered as significantly differentially expressed genes. Heat maps of differentially expressed genes were generated by MeV software (https://sourceforge.net/projects/mev-tm4/).

Result

Phenotypic evaluation and variability

All the genotypes of the wheat association panel were phenotyped for grain size parameters (GL, GW, GLWR, and TGW) in seven different environments (E1-E7). The descriptive statistics of the investigated traits in seven environments are presented in Supplementary Table 2, and revealed wide variability for all the traits. GL ranged from 5.24 to 8.20 mm in E1, 5.09 to 8.17 mm in E2, 4.83 to 8.15 mm in E3, 5.04 to 8.09 mm in E4, 4.43 to 7.79 mm in E5, 4.43 to 8.41 mm in E6, and 4.47 to 7.87 mm in E7. GW ranged from 2.29 to 3.95 mm in E1, 2.26 to 3.71 mm in E2, 1.89 to 3.73 mm in E3, 1.87 to 3.69 mm in E4, 1.41 to 3.67 mm in E5, 1.41 to 3.58 mm in E6, and 1.48 to 3.81 mm in E7. GLWR ranged from 1.49 to 2.65 in E1, 1.29 to 2.70 in E2, 1.64 to 3.12 in E3, 1.64 to 3.52 in E4, 1.62 to 3.12 in E5, 1.69 to 3.12 in E6, and 1.61 to 3.02 in E7. TGW ranged from 18.00 to 64.54 in E1, 13.51 to 67.56 in E2, 7.91 to 51.93 in E3, 9.87 to 61.39 in E4, 12.17 to 56.66 in E5, 13.75 to 67.21 in E6, and 14.26 to 55.51 in E7. The coefficients of variation for GL, GW, GLWR, and TGW ranged from 8.32% to 24.00%, indicating considerable variability for these traits. The CV percent was highest for TGW in E6 (24.00%) followed by E4 (21.62%) and E2 (20.39%). The frequency distribution of four traits (GL, GW, GLWR, and TGW) (Supplementary Figure 1) showed near normal distribution in all environments, indicating the quantitative nature of these traits except for GW under environments E3 and E4.

Based on BLUP analysis (Supplementary Table 3), genotypes recorded an overall grand mean of 6.33 ± 0.40 mm, 6.34 ± 0.39 mm, 6.28 ± 0.36 mm, 6.23 ± 0.44 mm, 6.27 ± 0.34mm, 6.52 ± 0.28 mm, and 6.64 ± 0.63 mm respectively under different environments for grain length, whereas for grain width, overall means of 3.19 mm ± 0.06, 3.19 ± 0.07 mm, 3.10 ± 0.18 mm, 3.02 ± 0.39 mm, 3.22 ± 0.22 mm, 3.08 ± 0.41 mm, and 3.20 ± 0.41 mm respectively were recorded. GLWR recorded overall mean of 2.02 ± 0.40, 2.02 ± 0.40, 2.06 ± 0.15, 2.08 ± 0.16, 1.95 ± 0.15, 2.12 ± 0.14, and 2.13 ± 0.18 in different environments, whereas TGW recorded grand means of 42.60 ± 5.24 g, 44.30 ± 6.73 g, 37.66 ± 3.00 g, 39.85 ± 6.84 g, 43.07 ± 5.08 g, 38.35 ± 5.30 g, and 42.00 ± 3.12g respectively. Promising accessions were identified based on adjusted BLUP mean. The top ten accessions for GL were EC578134 (7.457mm), IC539313 (7.148 mm), EC339611 (7.106 mm), EC464070 (7.085 mm), C697725 (7.036 mm), IC252928 (7.033 mm), IC535217 (7.016 mm), EC542279 (7.014 mm), EC313710(7.005 mm), and EC578152(6.969 mm). For GW, the top ten accessions were IC252429 (3.753 mm), IC335683 (3.441 mm), IC252954 (3.432 mm), IC335715 (3.418 mm), IC252772 (3.38 mm), IC574476 (3.378 mm), IC252422 (3.369 mm), IC75240 (3.355 mm), IC122726(3.349 mm), IC116274(3.34 mm), and IC539314(3.328 mm). Similarly, promising accessions with more than 50 g thousand grain weight were identified as IC539313 (55.03 g), EC578134 (53.56 g), IC542076 (50.95 g), IC335715 (50.36 g), and EC578152 (50.0g).

The environment-wise heritability and variance components based on BLUP value are presented in Supplementary Table 3. Heritability for GL ranged from 22.5% (E7) to 82.4% (E4), whereas heritability for GW ranged from 22.3% (E7) to 60.2% (E4). Similarly, heritability for GLWR ranged from 22.0% (E1) to 62.3% (E5), whereas TGW was found to be heritable in the range of 21.8% (E6) to 74.1% (E3).

Multivariate analysis

Correlation between traits in different environments

Pearson’s correlation coefficients were estimated among grain traits for diverse wheat accessions under each environment separately (Supplementary Table 4). GL was found to have a consistently significant positive correlation with GW (0.368, 0.406, 0.444), GLWR (0.585, 0.562, 0.335), and TGW (0.471, 0.394, 0.538) under E1, E2, and E3 respectively. Contrarily, GW showed a negative correlation with GLWR (-0.363, -0.438, and -0.684) and significant positive correlation with TGW (0.371, 0.380, and 0.604) under E1, E2, and E3 respectively. Under the environment E4, GL showed significant positive correlations with GW, GLWR, and TGW that ranged between 0.225 (GW) to 0.457 (GLWR), while GW showed highly a significant negative correlation with GLWR (-0.748) and significant positive correlation with TGW (0.488). Similarly, under E5, a significant positive correlation was observed with GW (0.406), GLWR (0.361), and TGW (0.643), while GW showed a significant negative correlation with GLWR (-0.686) and significant positive correlation with TGW (0.723). A similar correlation pattern among seed traits was also observed under E6 and E7 (Supplementary Table 4).

Phenotypic correlation between different environments for traits

The magnitudes of correlation between environments were assessed for knowing behavior of genotypes for trait expression. A total of twenty-one combinations of correlations were observed between different pairs of environments for all the traits (Supplementary Table 5). Here, significant positive associations were revealed for response of traits by genotypes for all environment pairs except five for grain length, and TGW and three for grain width and GLWR. For GL, these correlation values ranged from 0.084 between E3 to E7 to 0.985 between E1 to both E2 and E4. For GW, these correlation values ranged from 0.061 between E4 to E7 to 0.953 between E1 and E2. Similarly, for GLWR, the lowest association was observed as 0.089 between E4 to E6 and highest as 0.961 between E1 and E2. For TGW, lowest association was observed as 0.021 between E4 to E6 and highest as 0.954 between E2 and E4.

Principal component analysis and correlation study based on pooled analysis

PCA was performed on the basis of pooled data for seven environments for grain parameters. Genotype by trait biplot depicted two-dimensional spatial diversity of accessions as well as trait variability (Figure 1A). High trait variability was observed for traits like GLWR, GL, GW, and TGW, which was evidenced by the larger length of the characters and high positive correlations between traits such as GL and TGW and GLWR and GL, and a negative correlation between TGW and GW, which is evident by the narrow angle between them. Here, the first principal component, PC1, explained 57.4% of the cumulative variation. The major contributing traits for PC1 were GW (0.90), TGW (0.90), and GL (0.79) in positive direction. Second principal component, PC2, explained 34.5% of the cumulative variation. The major contributing traits for PC2 were GLWR (0.97) and GL (0.56). Further, we analyzed correlations between traits using the BLUP values where GL showed a highly significant positive correlation with GW (0.562), GLWR (0.368), and TGW (0.667), whereas GW showed a highly significant negative correlation with GLWR (-0.508) and significant positive correlation with TGW (0.680). GLWR was not significantly related to TGW (Figure 1B).

FIGURE 1
www.frontiersin.org

Figure 1 (A) Principal component biplot based on BLUP value of grain parameters over the environments. (B) Scatter plot showing correlation matrix between grain parameters based on BLUP value.

Genotyping

A total of 125 wheat accessions representing a subset of the Indian National Genbank mini core germplasm were genotyped using 35K wheat SNP array that contains 35,143 genome-wide single nucleotide polymorphism (SNP) markers. The SNP probe sequences of wheat array were BLASTn (https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html) against wheat genome to find out their physical location, which revealed only 31,926 SNPs with known positions. Furthermore, SNPs were also filtered on the basis of minor allele frequency (≥0.05), missing threshold of < 10%, and call rate ≥97%. Finally, a total of 23,874 SNPs were retained for genetic diversity, population structure, and GWAS analysis. Mapping of 23,874 filtered SNP markers provided a whole genome-wide coverage along the 21 chromosomes of wheat (Figure 2). Further, distribution analysis of SNPs on wheat chromosomes revealed that the maximum number of SNPs was positioned on 1B (1594), followed by 2D (1575) and 1D (1516), while the lowest number was positioned on 4D (508), followed by 4B (800) and 4A (808). We also compared the distribution of SNPs on the three wheat sub genomes; it was found that 7,291 SNPs belonged to A sub-genome, 8,784 SNPs were found on B sub-genome, and 7,799 SNPs belonged to D sub-genome (Table 1).

FIGURE 2
www.frontiersin.org

Figure 2 SNP density plot of 21 wheat chromosomes displaying distribution of SNPs within 5 Mb window size. The horizontal axis shows chromosome length (Mb); Different colors depict SNP density.

TABLE 1
www.frontiersin.org

Table 1 Chromosome-wise distribution of 23,874 SNPs and the intra-chromosomal estimated LD among 125 wheat genotypes.

Population structure, kinship, and linkage disequilibrium decay analyses

We used 23, 874 SNP markers to ascertain the population structure in the wheat mini core set using STRUCTURE and PCA analysis. The most probable number of populations were estimated using delta K method implemented in the STRUCTURE HARVESTOR program. The value of ΔK peaked at K=2 and revealed two sub-populations in the wheat mini core germplasm. Sub-population 1 represented 82% of the individuals; out of that, 62% were pure and 38% admixtures. Whereas sub-population 2 had 18% of the individuals of the AM panel, and contained 75% pure and 25% admixtures. PCA also detected the two sub-populations indicated by two significant components, explaining the maximum variation of the population. Further, kinship matrix was also created to explore the relationship among the individuals using the genome association and prediction integrated tool (GAPIT) which demonstrated the presence of two sub-groups within the association panel (Figure 3).

FIGURE 3
www.frontiersin.org

Figure 3 Population structure analysis of wheat association mapping panel. (A) Magnitude of ΔK values, rate of change from 2 to 5 in association mapping panel. (B) Population structure of association panel based on 125 germplasm-based SNP markers at K = 2. Different color columns represent different sub sub-populations. (C) Principal component analysis showing two sub sub-populations. (D) Heat map of kinship matrix. The heat map shows the level of relatedness among the population. The darker areas show the level of relatedness between genotypes and the dendrogram depicts clustering of sub sub-population.

The LD decay in the wheat mini core set was estimated by calculating the squared correlation coefficient (r2) for all the SNPs. The LD decay for the whole genome was 1.9 Mb. Further, it was found that the decay was most rapid in the A sub-genome (1.63 Mb), followed by the D sub-genome (1.93 Mb) and B sub-genome (2.28 Mb) (Figure 4).

FIGURE 4
www.frontiersin.org

Figure 4 The rate of Linkage disequilibrium decay (R2) between pairs of polymorphic markers of the whole wheat genome and its sub-genomes A, B, and D are plotted against the genetic distance (Mb).

GWAS for grain size traits

GWAS was performed using 23,874 SNPs filtered on various parameters to identify genomic loci associated to four different grain yield traits (GL, GW, GLWR, and TGW) independently for the seven environments and also based on the BLUP values derived from data of grain size traits of all the seven environments. Here, we have used five multi-locus models (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO) to conduct GWAS. A total of 752 significant SNPs were predicted for four grain size traits using ML-GWAS models with LOD score ≥ 3. Manhattan plots for GL drawn using various ML-GWAS models that depict marker trait associations are presented in Figure 5. Of these 752 SNPs, 72 were identified using BLUP values derived from the data of all the environments and other SNPs were identified by analyzing data of each location separately. We classified 752 SNPs according to trait, which demonstrated that 156, 179, 250, and 167 SNPs were associated with GL, GW, GLWR and TGW traits, respectively (Figure 6). In addition, comparative study on the basis of the GWAS model demonstrated that 28, 32, 36, and 31 SNPs were predicted for four traits (GL, GW, GLWR, and TGW respectively) by the mrMLM model while the FASTmrMLM model could detect 34, 42, 79, and 39 SNPs for GL, GW, GLWR, and TGW. FASTmrEMMA and pLARmEB model identified 13, 20, 17, and 12 and 36, 50, 69, and 45 SNPs for the four traits (GL, GW, GLWR, and TGW respectively). The ISIS EM-BASSO model detected 45, 38, 49, and 40 SNPs for GL, GW, GLWR, and TGW respectively (Figure 6).

FIGURE 5
www.frontiersin.org

Figure 5 Manhattan plots of associated QTNs for grain length (GL) in wheat using multi-locus GWAS model. The x-axis shows the chromosome label and the y-axis displays - thresholds for significance (LOD score = 3) and log10 (p-value). The significant QTNs with LOD score >=3 is represented with purple dots.

FIGURE 6
www.frontiersin.org

Figure 6 Distribution of identified and significant SNPs for each trait on the basis of detection models of multi-GWAS.

On the basis of redundancy of SNPs in the models and locations, we combined the identified SNPs and found a total of 160 SNPs were simultaneously detected in two or more multi-locus models. These SNPs were designated as reliable QTNs for the respective traits. Furthermore, distribution of these 160 significant SNPs was also analyzed across the environments. Out of these 160 QTNs, 87 were confined to only one environment that included 13, 19, 27, and 25 QTNs for GL, GW, GLWR, and TGW respectively, and 3 QTNs that were associated with more than one trait (Table S6). On the other hand, a total of 73 QTNs were simultaneously identified in two or more environments as well as two or more models (Table 2). Among these 14, 17, 11, and 10 QTNs were identified for GL, GW, GLWR, and TGW traits respectively while 21 SNPs were associated with more than one trait. The physical distribution of all the 160 SNPs on chromosomes demonstrated that SNPs were present on all the chromosomes. Moreover, the highest number of SNPs were found on chr3D (7 SNPs), followed by chr2D (6 SNPs), and chr7D (6 SNPs) while chr1D, chr6B, and chr6D had only one SNPs.

TABLE 2
www.frontiersin.org

Table 2 List of significant QTNs detected simultaneously using two or more multi-locus GWAS methods for four wheat yield-related grain shapes across the environments.

Allelic effects of identified genomic regions on grain shape

To evaluate the allelic effects of QTNs on respective phenotype, we only analyzed those QTNs that were detected in more than two environments and revealed R2 value ≥10% with at least one GWAS model. Association panel genotypes were divided into two groups according to allele types in order to test whether the mean phenotypes of the two groups were significantly different (Figure 7). Results showed that six QTNs had significant effect (P ≤ 0.01) on their respective traits. Among these six QTNs, four QTNs (Q.GL-GW-3D (AX-95008504), Q.GL-5D (AX-94482861), Q.GL-5D (AX-95020206), and Q.GL-TGW-6A (AX-95238912)) demonstrated significant effects on GL(mm) whereas one, QTN Q.GW-4B (AX-94878781), had significant effect on GW and another, Q.GLWR-2A (AX-94736090), showed significant effect on GLWR. The QTNs with significant phenotypic effects on seed traits might contribute to their genetic variations.

FIGURE 7
www.frontiersin.org

Figure 7 Boxplot for 6 reliable QTNs (A–F). Genotypes were divided into two groups at each locus based on the allele type. A significant difference between the phenotype of these two groups was analyzed using t-test (P ≤ 0.001). Two alleles for each QTN (Locus) are given on X-axis. Y-axis shows phenotypic values of the traits.

Annotation of identified QTNs

All the 160 significantly associated QTNs with grain size traits that were detected in two or more models were searched for their annotation in the wheat reference genome assembly cv. Chinese Spring (IWGSCrefseq version 2.0, https://wheat.pw.usda.gov/GG3/iwgsc-2.0), available at Plant Ensemble. Of these 160 QTNs, annotation was only detected for 136 QTNs. The detailed analysis of annotated SNPs showed that Q.TGW-5D (SNP-AX-95234313) was located within a gene encoding cytochrome 450 and was identified in E4 environment using mrMLM and FASTmrMLM models with LOD score ranging between 3.8 to 3.92. Another, Q.GW-3D (SNP-AX-94540502), for grain weight identified at the E6 environment was annotated as ABC transporter and detected by mrMLM and ISIS EM-BLASSO models. We also checked the annotation of QTNs that were identified at multiple environments and found that Q.GL-1B (SNP-AX-94699549), Q.GL-GLWR-4B (SNP-AX-94879134), Q.GL-3D (SNP-AX-95074739), and Q.GW-5A (SNP-AX-94657794) were located within genes encoding ABC transporter, WRKY transcription_factor, Glucan endo-1,3-beta-glucosidase, and Zinc_finger_protein respectively. Among these four QTLs, Q.GL-1B (encoding for an ABC transporter) was located at 585,331,222bp on chromosome 1B. It was identified in two different environments, E1 and E4, using ISIS EM-BLASSO model with LOD scores 4.17 and 5.65 and R2 3.05% and 6.42%, respectively. Q.GL-GLWR-4B (encoding for a WRKY transcription factor) was located on chr4B at 63,231,309bp position and identified in four different environments: E1 with ISIS EM-BLASSO model, E2 with ISIS EM-BLASSO and FASTmrEMMA models, E3 with pLARmEB model, and E4 with ISIS EM-BLASSO model. It had an LOD score ranging between 3.52 to 8.12. This QTN was associated with GL in all locations except E2. Further, in E2, this Q.GL-GLWR-4B was also determined by three different models, namely FASTmrMLM, pLARmEB, and ISIS EM-BLASSO, but associated with GLWR trait.

Expression analysis

The transcriptome sequencing of contrasting seed size wheat genotypes, i.e. IC111905 (large-seeded) and EC575981(small-seeded), was performed at two time intervals during the seed development (i.e., 15 and 30 DPA) to quantify expression of all annotated genes within associated genomic regions.

Expression analysis demonstrated that only 123 genes were expressed in both stages (15 and 30 DPA) of small and large seeded genotypes, of which 23, 33, 27, and 28 were uniquely associated with GL, GW, GLWR, and TGW respectively. Among the identified genes, those with foldchange ≥1 and p-value<0.05 were considered as significantly differentially expressed genes. A total of 18 and 12 genes were significantly differentially regulated in large seeded cultivars at 15 and 30 DPA respectively. At 15 DPA, 11 genes were upregulated and 7 genes were downregulated, while 6 genes were significantly upregulated and downregulated in large seed cultivars at 30 DPA (Figure 8). Many genes, including TraesCS7B02G462900 (SNP-AX-94472687; Q.GW-7B), TraesCS2D02G132600 (SNP-AX-94499721; Q.GLWR-2D), TraesCS1A02G187000 (SNP-AX-95120969; Q.TGW-GW-1A), TraesCS2B02G260200 (SNP-AX-95129853; Q.GLWR-GL-2B), and TraesCS6A02G379200 (SNP-AX-95151036; Q.GW-6A), were downregulated at both the time points while TraesCS1A02G427400 (SNP-AX-94605845; Q.TGW-1A), TraesCS3D02G002700 (SNP-AX-94642652; Q.GW-3D), TraesCS7A02G111200 (SNP-AX-94820170; Q.GLWR-GL-7A), and TraesCS3A02G180200 (SNP-AX-94960788; Q.GL-3A) genes were upregulated at 15 as well as 30 DPA. Result also showed that Q.GLWR-5B (SNP-AX-94915493) associated with GLWR was located within gene TraesCS5B02G552400 (hypothetical protein), which was only expressed in small grain cultivars with 10 fold upregulation, which showed it has some specific role in small seed cultivars. Another gene, namely TraesCS7D02G463100, located within the QTNQ.GW-7D and identified in E1 and E5 location was upregulated in large grain cultivars. TraesCS7D02G463100 is nuclear transcription factor associated with GLWR.

FIGURE 8
www.frontiersin.org

Figure 8 Heat maps of candidate genes identified for four-grain shape traits in small and large-size seeded wheat genotypes at the two developmental stages (15 and 30 DPAs). The figure panels show heat map for traits (A) GL, (B) GW, (C) GLWR, and (D) TGW. The genotype names are suffixed with 15 and 30, which indicate number of days after post-anthesis. Red indicates higher gene expression while green represents lower gene expression level; the gene expression levels are log2 transformed.

Discussion

Grain yield is a highly complex agronomic trait, governed by several genes and also influenced by environmental conditions (Li et al., 2022). It is essentially determined by two main components i.e., number of grains per m2 and thousand grain weight (TGW) (Sun et al., 2009; Kumari et al., 2018; Li et al., 2019). In the breeding history, grain yield was mainly improved with increase in the grain number per m2, which is determined by grain number per spike (Kumari et al., 2018). In the present study, we focused on the grain shape traits i.e. GL, GW, and GLWR, which determine TGW, a phenotypically stable yield contributing trait and used by the breeders for selecting high yielding varieties (Avni et al., 2018; Duan et al., 2020). The TGW and other grain size traits contribute to higher grain yield than grain number per spike. (Ji et al., 2022). Thus, it is very important to study the grain shape traits when the aim is to improve grain yield. Here, we have applied GWAS to identify genomic regions regulating variation for grain yield in a sub-set of the Indian National Genebank wheat mini core set germplasm (125 accessions). These mini core set accessions have been identified from a core set (2226 accessions), constituted from the entire wheat accessions (22416) conserved in the National Genebank of India (Phogat et al., 2020). Therefore, the mini core set accessions are a valuable genetic resource for mapping various desirable traits including grain shape traits.

The phenotyping of wheat mini core set accessions across the seven environments revealed significant variability among wheat accessions for grain parameters. High coefficients of variation for TGW under all the environments indicated broad phenotypic variation and a large improvement potential. Heritability is the proportion of genotypic variance to all observable variance in the total population. Over the environments, heritability was high for GL and moderate for GW, GLWR, and TGW. The trend of heritability is more specific to environment than traits, as we observed low/moderate heritability for E5 and E6. These environments fall in stress prone areas affected by less rainfall and high temperatures, which might have caused low heritability of the traits. Promising accessions for grain parameters were identified. Among them, EC578134, IC539313, IC535217, EC464070, and EC578152 were promising for GL as well as TGW. EC339611, EC578134, and IC535217 were promising for GL as well as GLWR, whereas IC335715 was promising for GW and TGW. These accessions can be used in breeding programs for trait introgression, genetics, and genomics study. The significant positive correlation of grain length and width with thousand grain weight revealed that the selection of grains with increased width and length can greatly contribute to grain weight and indirectly to grain yield. Earlier studies have reported moderate to strong correlations between TGW and size (Rasheed et al., 2018). Simmonds et al. (2016) reported that GL and GW in tetraploid and hexaploid wheat can greatly influence the TGW, as longer and broader grains have more starch accumulation and, hence, a higher weight (Simmonds et al., 2016). Previous studies have also reported positive associations among TGW, GL, and GW (Breseghello and Sorrells, 2007; Ramya et al., 2010). Principal component analysis also found GW, TGW, and GL as major contributing traits positively contributing to variations in grain shape among wheat genotypes (Figure 1). In our study, the association between GW and GLWR is consistent and significantly negative in all environments. A negative correlation between GW and GLWR could be attributed to compensation of photosynthates to GW rather than to GL. The different correlations could be explained by the influence of the environment on the plant growth and grain development. This study shows that GL, GW, and GLWR are all expected to increase with TGW, one of the major yield components of grain yield and which can be targeted to enhance wheat yield potential. Genetic diversity and population structure in the wheat mini core subset was analyzed using 35K wheat SNP array. Both STRUCTURE and PCA analyses revealed two subpopulations in the wheat mini core set germplasm used in our study. The whole genome LD decay distance in the wheat mini core set was 1.93Mb. Further, LD decay was most rapid in A genome followed by D and B sub-genome. Many earlier studies in wheat have reported much longer LD decay distance ranging from 4Mb to 15 Mb or even more (Pang et al., 2020; Hanif et al., 2021; Li et al., 2021). This suggested the presence of a big LD block size, which has so far limited high-resolution trait mapping in wheat. One of the ways to overcome this problem is to use a very high density genic-SNP array having lakhs of SNPs derived from the coding regions for genotyping of association panels used for conducting GWAS. The high-density genotyping would facilitate construction of haplotypes maps of the associated regions that may help us in pinpointing the exact causal SNP/genes for the target traits.

The Genome-wide association study (GWAS) has been found to be a powerful tool to investigate genetic bases of complex traits in many plant species such as rice, maize, soybean, and wheat (Zegeye et al., 2014; Zhang et al., 2015; Spindel et al., 2016; Zhang et al., 2018; Chaurasia et al., 2020; Chaurasia et al., 2021). There are many statistical methods based on different algorithms that can be used to predict the true association between SNP markers and corresponding phenotypic variations in GWAS (Spindel et al., 2016). In our study, we used the ML-GWAS method for the detection of marker trait-associations for grain shape traits. Multi-locus methods are effective because of their higher statistical power which provides higher efficiency and accuracy for QTNs detection. In numerous studies, it has been found that ML-GWAS is much better than other methods (Bennett et al., 2012; Visioni et al., 2013; Spindel et al., 2016; Ma et al., 2018; Xu et al., 2018; Khan et al., 2019). Peng et al. (2018) used six ML-GWAS models to detect the genetic dissection of 20 free amino acid (AA) levels in T. aestivum and claimed that ML-GWAS methods are more reliable and powerful. In the current study, we used five multi-locus methods, mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO, to perform GWAS analysis of four agronomic traits in our association panel. Among these five models, pLARmB identified the highest number of QTNs (211 SNPs), followed by FASTmrMLM (202), ISI EMBLASSO (177), mrMLM(132), and FASTmrEMMA(62).

QTNs for thousand grain weight, grain length, grain width and grain length width ratio

QTL for grain yield component traits have been extensively studied and reported on all the 21 chromosomes of wheat (Brinton et al., 2017; Cao et al., 2019; Ma et al., 2019; Ji et al., 2022). In our analysis, a total of 160 reliable QTNs were detected for four grain shape-related traits across the seven locations (Table 2; Supplementary Table 6).

For grain length, 27 QTNs were detected, which were distributed on 17 wheat chromosomes (chr1B, chr1D, chr2B, chr2D, chr3A, chr3B, chr3D, chr4A, chr4B, chr5A, chr5B, chr5D, chr6B, chr6D, chr7A, chr7B, and chr7D). Among these 27 QTNs, 10 QTNs were major (R2 ≥ 10% at least in one GWAS method), of which Q.GL-4A (SNP-AX-94839917), Q.GL-7D (SNP-AX-94872194), Q.GL-7A (SNP-AX-94760450) and Q.GL-6D (SNP-AX-94647721) were strongest because their R2 value were ≥20%. The Q.GL-4A on the chr4A with highest R2 = 31.94% may explain a significant proportion of the variation for GL in the wheat mini core germplasm. Moreover, this QTN was identified simultaneously in two environments i.e., E1 and E2, and with four different models. The Q.GL-7D is located within a gene encoding Thioredoxin M type protein with R2 ranging between 4.82% to 21.88%. This QTN was predicted in three different environments i.e., E1, E3, and E4, and using three different models, suggesting this could be a reliable QTN contributing to GL variation in wheat. In an earlier study, Thioredoxin has been shown to play an important role in preventing sprouting of developing grains in cereals (wheat and barley) by reducing the intramolecular disulfide bonds of storage proteins and other proteins in the starchy endosperm, and thereby affecting grain yield (Guo et al., 2013).

For grain width, 36 QTNs were identified that were distributed on 17 wheat chromosomes. Among these QTNs, Q.GW-3D (SNP-AX-94642652), Q.GW-5B (SNP-AX-94547840), Q.GW-3A (SNP-AX-94741529), Q.GW-4D (SNP-AX-95213549), and Q.GW-2B (SNP-AX-94519462) were predicted as major QTNs as the phenotypic variance explained by these QTLs was ≥10% of at least one of the ML- GWAS models. Q.GW-3D and Q.GW-5B were annotated as unnamed protein product and hypothetical protein respectively. Additionally, Q.GW-2B and Q.GW-4D had R2 values ranging from 7.7 to 12.23 and 0.72 to 17.72, respectively. Q.GW-2B was identified at two environments E3 and E4, while Q.GW-4D was identified at E2. Interestingly, both intragenic SNPs showed higher expression in large grain wheat cultivars than small seed cultivars. This suggested that these QTNs might have important roles in determining variation for GW in wheat.

Thirty-seven and thirty-five QTNs were predictive for GLWR and TGW traits respectively. The GLWR-associated QTNs were distributed on 17 chromosomes (chr1A, chr1D, chr2A, chr2B, chr2D, chr3A, chr3B, chr3D, chr4A, chr4B, chr5A, chr5B, chr6A, chr6B, chr6D, chr7B, and chr7D) while QTNs for TGW were spread over 16 chromosomes (chr1A, chr1B, chr1D, chr2A, chr2B, chr2D, chr3A, chr3B, chr3D, chr4B, chr4D, chr5A, chr5D, chr6A, chr7B, and chr7D). In TGW, a total of fourteen SNPs had R2≥10 and were considered as major genomic regions for this trait. Q.TGW-1A (SNP-AX-94605845) was annotated as TTL1 protein (TETRATRICOPEPTIDE-REPEAT THIOREDOXIN-LIKE 1) with R2 = 11.78% and highly expressed in large grains. Studies have reported that TTL1 positively regulates the stress response regulated by ABA (Guo et al., 2013). The loss of TTL1 function causes plants to be sensitive to salt and osmotic stress during seed germination and later development (Rosado et al., 2006). So, it could be possible that the identified genomic region in our study may positively regulate the expression of TTL1 gene and regulate seed maturation. Q.TGW-5D (SNP-AX-95234313) was located within a gene encoding cytochrome 450 and it was only identified at E4 environment with R2 = 21.45. In a previous study on cytochrome family protein, CYP78A3 on chr7 has been shown to play an important role in wheat seed development by promoting integument cell proliferation (Ma et al., 2015). Thus, it could be suggested that Q.TGW-5D (cytochrome 450) identified in our study might also have some role in seed development. A total of 13 QTNs were associated with GLWR and were considered as strong QTNs explaining ≥10% phenotyping variance of the trait. Most of the QTNs were annotated to be either hypothetical proteins or intergenic SNPs. Three QTNs for TGW, namely Q.GLWR-2D (SNP-AX-94922377), Q.GLWR-1A (SNP-AX-95213485), and Q.GLWR-6A (SNP-AX-94722285), were simultaneously identified in three different environments and located on chr2D, chr1A, and chr6A respectively.

Comparison of the QTLs identified in the present and previous studies

In wheat, several candidate genes underlying grain size and weight have been identified including TaGS (Bernard et al., 2008), TaGW2 (Su et al., 2011), TaGS-D1 (Zhang et al., 2014), TaCWI (Jiang et al., 2015), and Tackx4 (Chang et al., 2015). Additionally, McCartney et al. (2005) identified two major QTLs for TKW responsible for reduced plant height that were near the Rht-B1b and Rht-D1b genes that control plant height (McCartney et al., 2005; Gao et al., 2015). Another QTL, Qtgw-cb.5A, was identified as a key determinant of final grain weight which increased grain length by driving pericarp cell expansion (Brinton et al., 2017). We performed the comparative analysis of QTNs for grain shape identified in the present study with previously reported QTLs on the basis of their physical locations on chromosomes. Some of the previously reported grain size-associated QTLs were also predicted in our analysis. For example, Qgl.cib-CK1-4A associated with GL on chr4A coincided with Q.GL-4A (SNP-AX-94839917) for grain length trait at the same region on chr4A and identified in two environments (E1 and E2). Further, LOD (4.92~6.13) and R2 value (15.33~31.94) of this QTL demonstrated its importance in regulating GL trait. Goel et al. (2019) identified qTKW.6A.1 associated with TGW on 6A at the interval 166.64-596.18 Mb (Goel et al., 2019). The QTN, Q.TGW-6A (SNP-AX-95240001), identified in our study appears to correspond to qTKW.6A.1. Interestingly, Q.TGW-6A was identified at two locations, E2 and E4, which showed that it is a stable genomic region for TGW. Further, we found that Q.GL-TGW-6A (SNP-AX-95238912) and Q.GLWR-6A (SNP-AX-94722285), which are located on chr6A at 362.7Mb and 307Mb, overlapped with the grain shape QTLs identified by Cao et al. (2019) and Ji et al. (2022), respectively. Interestingly, Q.GL-TGW-6A was identified in three environments (E3, E4, and E5). LOD score and R2 value of Q.GL-TGW-6A and Q.GLWR-6A ranged from 3.46 to 8.23 and 2.6 to 23.8 respectively. On the other hand, Q.GLWR-6A was present at 307Mb on chr6A with LOD score (3.47~7.59) and R2 value (5.03~18.87). Since the two QTNs on chr6A were also identified in the previous studies, these appear to represent major genomic regions for the grain shape traits.

A few other underlying genes influenced grain size and weight have been reported by Cabral et al., 2018. TaGS-D1, controlling GL and grain weight, is an ortholog of OsGs3 and located at 106.73 Mb on chr3D. Expression pattern of this TaGS-D1 (TraesCS7A03G0037700) gene in our data showed relatively higher expression in large seeded genotypes as compare to small seeded genotypes. So, we examined nearby QTNs around the gene and we found two QTNs, Q.GLWR-7D and Q.TGW-7D, located in the vicinity of TaGS-D1 and positioned at 54.9Mb on chr7DS and 100.1 Mb on chr7D respectively. The presence of these two QTNs indirectly suggested a major locus which corresponds to either TaGS-D1 or an additional novel gene for grain shape trait on the short of chr7D. A second grain weight locus cytokinin oxidase/dehydrogenase (TaCKX6-D1) gene is physically located on chromosome 3D and played a key role in controlling cytokinin levels and affects grain weight in wheat (Zhang et al., 2012). TaCKX6-D1 gene is located at 106.73 Mb on chr3D, so its expression could be influenced by nearby SNPs around the gene. On the basis of the physical location of gene, we found two significant genes, Q.GL-GW-3D and Q.TGW-3D (SNP-AX-95008504, and SNP-AX-94406908), in our analysis at 151.4 Mb and 239.3 Mb respectively. Q.GL-GW-3D associated with GL was identified in four environments (E1, E2, E4, and E5) with LOD value from 3.05 to 6.33 and R2 value from 4.09 to 14.97, which showed the significance of SNP. The second, Q.TGW-3D, demonstrated association with TGW with LOD (3.16~8.2) and R2 (7.21~11.48) and was identified at E2, E3, and E4 environments. Both the QTNs were annotated as hypothetical proteins and expressed in our transcriptome data. Q.TGW-3D was highly expressed in large seed cultivars while Q.GL-GW-3D also showed expression in both cultivars. In conclusion, in this study we have comprehensively phenotyped wheat mini core germplasm accessions for grain shape traits and identified promising accessions for large grain size and length which can be incorporated in breeding programs. Further, integration of phenotyping and genotyping data has enabled us to identify genomic regions/candidate genes, some of which are novel. Comparative study also showed that many QTNs identified in our study represented novel genomic regions that can be further validated for their role in determining grain size and can be potentially exploited in breeding programs to develop high-yielding varieties.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm.nih.gov/, PRJNA906296.

Author contributions

JK planned and handled the field experiments, conducted formal analysis, and assisted in writing the manuscript. DL contributed to the data curation, investigation, formal analysis, and original draft writing. MY and ST planned and executed DNA extraction and SNP data generation. PJ, SS, ST, SM, and HA performed data analysis, writing, reviewing, and editing, NS, KS and KM recorded phenotyping data and reviewed and edited the manuscript, RS, MY, and GS contributed resources and participated in the writing, reviewing, and editing. AS contributed to the conceptualization, supervision, writing, reviewing, and editing of the manuscript. All authors contributed to the article and approved the submitted version.

Funding

Financial support for this study was received from the Science and Engineering Research Board (project: EMR/2017/005133) and ICAR-National Innovations in Climate Resilient Agriculture (NICRA) Project (Project code 1006607)

Acknowledgments

The authors are thankful to the Director of the Indian Council of Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources (NBPGR) for providing laboratory and field facilities needed to undertake this work.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2023.1148658/full#supplementary-material

References

Allen, A. M., Winfield, M. O., Burridge, A. J., Downie, R. C., Benbow, H. R., Barker, G. L., et al. (2017). Characterization of a wheat breeders' array suitable for high-throughput SNP genotyping of global accessions of hexaploid bread wheat (Triticum aestivum). Plant Biotechno J. 15, 390–401. doi: 10.1111/pbi.12635

CrossRef Full Text | Google Scholar

Avni, R., Oren, L., Shabtay, G., Assili, S., Pozniak, C., Hale, I., et al. (2018). Genome based meta-QTL analysis of grain weight in tetraploid wheat identifies rare alleles of GRF4 associated with larger grains. Genes 9, 636. doi: 10.3390/genes9120636

PubMed Abstract | CrossRef Full Text | Google Scholar

Bennett, D., Izanloo, A., Reynolds, M., Kuchel, H., Langridge, P., Schnurbusch, T. (2012). Genetic dissection of grain yield and physical grain quality in bread wheat (Triticum aestivum L.) under water-limited environments. Theor. Appl. Genet. 125, 255–271. doi: 10.1007/s00122-012-1831-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Bernard, S. M., Moller, A. L., Dionisio, G., Kichey, T., Jahn, T. P., Dubois, F., et al. (2008). Gene expression, cellular localisation and function of glutamine synthetase isozymes in wheat (Triticum aestivum L.). Plant Mol. Biol. 67, 89–105. doi: 10.1007/s11103-008-9303-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Breseghello, F., Sorrells, M. E. (2007). QTL analysis of kernel size and shape in two hexaploid wheat mapping populations. Field Crops Res. 101, 172–179. doi: 10.1016/j.fcr.2006.11.008

CrossRef Full Text | Google Scholar

Brinton, J., Simmonds, J., Minter, F., Leverington-Waite, M., Snape, J., Uauy, C. (2017). Increased pericarp cell length underlies a major quantitative trait locus for grain weight in hexaploid wheat. TNew Phytol. 215, 1026–1038. doi: 10.1111/nph.14624

CrossRef Full Text | Google Scholar

Cabral, A. L., Jordan, M. C., Larson, G., Somers, D. J., Humphreys, D. G., McCartney, C. A. (2018). Relationship between QTL for grain shape, grain weight, test weight, milling yield, and plant height in the spring wheat cross RL4452/'AC domain'. PloS One 13, e0190681. doi: 10.1371/journal.pone.0190681

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, P., Liang, X., Zhao, H., Feng, B., Xu, E., Wang, L., et al. (2019). Identification of the quantitative trait loci controlling spike-related traits in hexaploid wheat (Triticum aestivum L.). Planta 250, 1967–1981. doi: 10.1007/s00425-019-03278-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Chang, C., Lu, J., Zhang, H. P., Ma, C. X., Sun, G. (2015). Copy number variation of cytokinin oxidase gene Tackx4 associated with grain weight and chlorophyll content of flag leaf in common wheat. PloS One 10, e0145970. doi: 10.1371/journal.pone.0145970

PubMed Abstract | CrossRef Full Text | Google Scholar

Chaurasia, S., Singh, A. K., Kumar, A., Songachan, L. S., Yadav, M. C., Kumar, S., et al. (2021). Genome-wide association mapping reveals key genomic regions for physiological and yield-related traits under salinity stress in wheat (Triticum aestivum L.). Genomics 113, 3198–3215. doi: 10.1016/j.ygeno.2021.07.014

PubMed Abstract | CrossRef Full Text | Google Scholar

Chaurasia, S., Singh, A. K., Songachan, L. S., Sharma, A. D., Bhardwaj, R., Singh, K. (2020). Multi-locus genome-wide association studies reveal novel genomic regions associated with vegetative stage salt tolerance in bread wheat (Triticum aestivum L.). Genomics 112, 4608–4621. doi: 10.1016/j.ygeno.2020.08.006

PubMed Abstract | CrossRef Full Text | Google Scholar

Duan, X., Yu, H., Ma, W., Sun, J., Zhao, Y., Yang, R., et al. (2020). A major and stable QTL controlling wheat thousand grain weight: identification, characterization, and CAPS marker development. Mol. Breed. 40, 68. doi: 10.1007/s11032-020-01147-3

CrossRef Full Text | Google Scholar

Evanno, G., Regnaut, S., Goudet, J. (2005). Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 14 (8), 2611–2620. doi: 10.1111/j.1365-294X.2005.02553.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Gao, F., Wen, W., Liu, J., Rasheed, A., Yin, G., Xia, X., et al. (2015). Genome-wide linkage mapping of QTL for yield components, plant height and yield-related physiological traits in the Chinese wheat cross zhou 8425B/Chinese spring. Front. Plant Sci. 6. doi: 10.3389/fpls.2015.01099

PubMed Abstract | CrossRef Full Text | Google Scholar

Goel, S., Singh, K., Singh, B., Grewal, S., Dwivedi, N., Alqarawi, A. A., et al. (2019). Analysis of genetic control and QTL mapping of essential wheat grain quality traits in a recombinant inbred population. PloS One 14, e0200669. doi: 10.1371/journal.pone.0200669

PubMed Abstract | CrossRef Full Text | Google Scholar

Guo, H., Wang, S., Xu, F., Li, Y., Ren, J., Wang, X., et al. (2013). The role of thioredoxin h in protein metabolism during wheat (Triticum aestivum L.) seed germination. Plant Physiol. Biochem. 67, 137–143. doi: 10.1016/j.plaphy.2013.03.006

PubMed Abstract | CrossRef Full Text | Google Scholar

Hanif, U., Alipour, H., Gul, A., Jing, L., Darvishzadeh, R., Amir, R., et al. (2021). Characterization of the genetic basis of local adaptation of wheat landraces from Iran and Pakistan using genome-wide association study. Plant Genome 14(3), e2009. doi: 10.1002/tpg2.20096

CrossRef Full Text | Google Scholar

IWGSC, Appels, R., Eversole, K., Stein, N., Feuillet, C., Keller, B., et al. (2018). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361 (6403), eaar7191. doi: 10.1126/science.aar7191

PubMed Abstract | CrossRef Full Text | Google Scholar

Ji, G., Xu, Z., Fan, X., Zhou, Q., Chen, L., Yu, Q., et al. (2022). Identification and validation of major QTL for grain size and weight in bread wheat (Triticum aestivum l.). Crop J 11, 564–572. doi: 10.1016/j.cj.2022.06.014

CrossRef Full Text | Google Scholar

Jiang, Y., Jiang, Q., Hao, C., Hou, J., Wang, L., Zhang, H., et al. (2015). A yield-associated gene TaCWI, in wheat: its function, selection and evolution in global breeding revealed by haplotype analysis. Theor. Appl. Genet. 128, 131–143. doi: 10.1007/s00122-014-2417-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Kato, K., Miura, H., Sawada, S. (2000). Mapping QTLs controlling grain yield and its components on chromosome 5A of wheat. Theor. Appl. Genet. 101, 1114–1121. doi: 10.1007/s001220051587

CrossRef Full Text | Google Scholar

Khan, S. U., Yangmiao, J., Liu, S., Zhang, K., Khan, M. H. U., Zhai, Y., et al. (2019). Genome-wide association studies in the genetic dissection of ovule number, seed number, and seed weight in Brassica napus L. Ind. Crops Prod. 142, 111877. doi: 10.1016/j.indcrop.2019.111877

CrossRef Full Text | Google Scholar

Kumari, S., Jaiswal, V., Mishra, V. K., Paliwal, R., Balyan, H. S., Gupta, P. K. (2018). QTL mapping for some grain traits in bread wheat (Triticum aestivum L.). Physiol. Mol. Biol. Plants 24, 909–920. doi: 10.1007/s12298-018-0552-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, T., Deng, G., Su, Y., Yang, Z., Tang, Y., Wang, J., et al. (2022). Genetic dissection of quantitative trait loci for grain size and weight by high-resolution genetic mapping in bread wheat (Triticum aestivum L.). Theor. Appl. Genet. 135, 257–271. doi: 10.1007/s00122-021-03964-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y., Tang, J., Liu, W., Yan, W., Sun, Y., Che, J., et al. (2021). The genetic architecture of grain yield in spring wheat based on genome-wide association study. Front. Genet. 12. doi: 10.3389/fgene.2021.728472

CrossRef Full Text | Google Scholar

Li, N., Xu, R., Li, Y. (2019). Molecular networks of seed size control in plants. Annu. Rev. Plant Biol. 70, 435–463. doi: 10.1146/annurev-arplant-050718-095851

PubMed Abstract | CrossRef Full Text | Google Scholar

Ling, H.-Q., Zhao, S., Liu, D., Wang, J., Sun, H., Zhang, C., et al. (2013). Draft genome of the wheat a-genome progenitor Triticum urartu. Nature 496, 87–90. doi: 10.1038/nature11997

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, L., Liu, M., Yan, Y., Qing, C., Zhang, X., Zhang, Y., et al. (2018). Genetic dissection of maize embryonic callus regenerative capacity using multi-locus genome-wide association studies. Front. Plant Sci. 9. doi: 10.3389/fpls.2018.00561

CrossRef Full Text | Google Scholar

Ma, M., Wang, Q., Li, Z., Cheng, H., Li, Z., Liu, X., et al. (2015). Expression of TaCYP78A3, a gene encoding cytochrome P450 CYP78A3 protein in wheat (Triticum aestivum L.), affects seed size. Plant J. 83, 312–325. doi: 10.1111/tpj.12896

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, J., Zhang, H., Li, S., Zou, Y., Li, T., Liu, J., et al. (2019). Identification of quantitative trait loci for kernel traits in a wheat cultivar Chuannong16. BMC Genet. 20, 77. doi: 10.1186/s12863-019-0782-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Malik, P., Kumar, J., Sharma, S., Sharma, R., Sharma, S. (2021). Multi-locus genome-wide association mapping for spike-related traits in bread wheat (Triticum aestivum L.). BMC Genomics 22, 597. doi: 10.1186/s12864-021-07834-5

PubMed Abstract | CrossRef Full Text | Google Scholar

McCartney, C. A., Somers, D. J., Humphreys, D. G., Lukow, O., Ames, N., Noll, J., et al. (2005). Mapping quantitative trait loci controlling agronomic traits in the spring wheat cross RL4452x'AC domain'. Genome 48, 870–883. doi: 10.1139/g05-055

PubMed Abstract | CrossRef Full Text | Google Scholar

Morris, G. P., Ramu, P., Deshpande, S. P., Hash, C. T., Shah, T., Upadhyaya, H. D., et al. (2013). Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc. Natl. Acad. Sci. United States America 110 (2), 453–458. doi: 10.1073/pnas.1215985110

CrossRef Full Text | Google Scholar

Murray, M. G., Thompson, W. F. (1980). Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8 (19), 4321–4325. doi: 10.1093/nar/8.19.4321

PubMed Abstract | CrossRef Full Text | Google Scholar

Nehe, A., Akin, B., Sanal, T., Evlice, A. K., Unsal, R., Dincer, N., et al. (2019). Genotype x environment interaction and genetic gain for grain yield and grain quality traits in Turkish spring wheat released between 1964 and 2010. PloS One 14, e0219432. doi: 10.1371/journal.pone.0219432

PubMed Abstract | CrossRef Full Text | Google Scholar

Newell, M. A., Cook, D., Tinker, N. A., Jannink, J. L. (2011). Population structure and linkage disequilibrium in oat (Avena sativa L.): implications for genome-wide association studies. TAG Theo.r Appl. Genet.s 122, 623–632. doi: 10.1007/s00122-010-1474-7

CrossRef Full Text | Google Scholar

Pang, Y., Liu, C., Wang, D., St Amand, P., Bernardo, A., Li, W., et al. (2020). High-resolution genome-wide association study identifies genomic regions and candidate genes for important agronomic traits in wheat. Mol. Plant 13 (9), 1311–1327. doi: 10.1016/j.molp.2020.07.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, Y., Liu, H., Chen, J., Shi, T., Zhang, C., Sun, D., et al. (2018). Genome-wide association studies of free amino acid levels by six multi-locus models in bread wheat. Front. Plant Sci. 9. doi: 10.3389/fpls.2018.01196

CrossRef Full Text | Google Scholar

Phogat, S., Kumar, S., Kumari, J., Kumar, N., Pandey, A., Singh, T., et al. (2020). Characterization of wheat germplasm conserved in the Indian national genebank and establishment of a composite core collection. Crop Sci. 61, 604–620. doi: 10.1002/csc2.20285

CrossRef Full Text | Google Scholar

Ramya, P., Chaubal, A., Kulkarni, K., Gupta, L., Kadoo, N., Dhaliwal, H. S., et al. (2010). QTL mapping of 1000-kernel weight, kernel length, and kernel width in bread wheat (Triticum aestivum L.). J. Appl. Genet. 51, 421–429. doi: 10.1007/BF03208872

PubMed Abstract | CrossRef Full Text | Google Scholar

Rasheed, A., Xia, X., Ogbonnaya, F., Mahmood, T., Zhang, Z., Kazi., A. M., et al. (2018). Genome-wide association for grain morphology in synthetic hexaploid wheats using digital imaging analysis. BMC Plant Biol. 14. doi: 10.1186/1471-2229-14-128

CrossRef Full Text | Google Scholar

Ren, W.-L., Wen, Y.-J., Dunwell, J. M., Zhang, Y.-M. (2018). pKWmEB: integration of kruskal–Wallis test with empirical bayes under polygenic background control for multi-locus genome-wide association study. Heredity 120, 208–218. doi: 10.1038/s41437-017-0007-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Rodríguez, F., Alvarado, G., Pacheco, Á., Burgueño, J. (2018). ACBD-R. Augmented Complete Block Design with R for Windows. Version 4.0. CIMMYT Research Data & Software Repository Network; Texcoco de Mora, Mexico

Google Scholar

Rosado, A., Schapire, A. L., Bressan, R. A., Harfouche, A. L., Hasegawa, P. M., Valpuesta, et al. (2006). The arabidopsis tetratricopeptide repeat-containing protein TTL1 is required for osmotic stress responses and abscisic acid sensitivity. Plant Physiol. 142, 1113–1126. doi: 10.1104/pp.106.085191

PubMed Abstract | CrossRef Full Text | Google Scholar

Simmonds, J., Scott, P., Brinton, J., Mestre, T. C., Bush, M., Del Blanco, A., et al. (2016). A splice acceptor site mutation in TaGW2-A1 increases thousand grain weight in tetraploid and hexaploid wheat through wider and longer grains. Theor. Appl. Genet. 129, 1099–1112. doi: 10.1007/s00122-016-2686-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Spindel, J. E., Begum, H., Akdemir, D., Collard, B., Redona, E., Jannink, J. L., et al. (2016). Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity 116, 395–408. doi: 10.1038/hdy.2015.113

PubMed Abstract | CrossRef Full Text | Google Scholar

Su, Z., Hao, C., Wang, L., Dong, Y., Zhang, X. (2011). Identification and development of a functional marker of TaGW2 associated with grain weight in bread wheat (Triticum aestivum L.). Theor. Appli. Genet. 122, 211–223. doi: 10.1007/s00122-010-1437-z

CrossRef Full Text | Google Scholar

Sun, C., Dong, Z., Zhao, L., Ren, Y., Zhang, N., Chen, F. (2020). The wheat 660K SNP array demonstrates great potential for marker-assisted selection in polyploid wheat. Plant Biotechnol. J. 18, 1354–1360. doi: 10.1111/pbi.13361

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, X.-Y., Wu, K., Zhao, Y., Kong, F.-M., Han, G.-Z., Jiang, H.-M., et al. (2009). QTL analysis of kernel shape and weight using recombinant inbred lines in wheat. Euphytica 165, 615–624. doi: 10.1007/s10681-008-9794-2

CrossRef Full Text | Google Scholar

Tamba, C. L., Ni, Y. L., Zhang, Y. M. (2017). Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies. PloS Comput. Biol. 13, e1005357. doi: 10.1371/journal.pcbi.1005357

PubMed Abstract | CrossRef Full Text | Google Scholar

Visioni, A., Tondelli, A., Francia, E., Pswarayi, A., Malosetti, M., Russell, J., et al. (2013). Genome-wide association mapping of frost tolerance in barley (Hordeum vulgare L.). BMC Genomics 14, 424. doi: 10.1186/1471-2164-14-424

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, S. B., Feng, J. Y., Ren, W. L., Huang, B., Zhou, L., Wen, Y. J., et al. (2016). Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci. Rep. 6, 19444. doi: 10.1038/srep19444

PubMed Abstract | CrossRef Full Text | Google Scholar

Wen, Y. J., Zhang, H., Ni, Y. L., Huang, B., Zhang, J., Feng, J. Y., et al. (2017). Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief. Bioinform. 18, 906. doi: 10.1093/bib/bbx028

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, Y., Xu, C., Xu, S. (2017). Prediction and association mapping of agronomic traits in maize using multiple omic data. Heredity 119, 174–184. doi: 10.1038/hdy.2017.27

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, Y., Yang, T., Zhou, Y., Yin, S., Li, P., Liu, J., et al. (2018). Genome-wide association mapping of starch pasting properties in maize using single-locus and multi-locus models. Front. Plant Sci. 9. doi: 10.3389/fpls.2018.01311

CrossRef Full Text | Google Scholar

Yu, J., Buckler, E. S. (2006). Genetic association mapping and genome organization of maize. Curr. Opini Biotechnol. 17, 155–160. doi: 10.1016/j.copbio.2006.02.003

CrossRef Full Text | Google Scholar

Zegeye, H., Rasheed, A., Makdis, F., Badebo, A., Ogbonnaya, F. C. (2014). Genome-wide association mapping for seedling and adult plant resistance to stripe rust in synthetic hexaploid wheat. PloS One 9, e105593. doi: 10.1371/journal.pone.0105593

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y. W., Tamba, C. L., Wen, Y. J., Li, P., Ren, W. L., Ni, Y. L., et al. (2020). mrMLM v4.0.2: an r platform for multi-locus genome-wide association studies. Genomics Proteomics Bioinf. 18 (4), 481–487. doi: 10.1016/j.gpb.2020.06.006

CrossRef Full Text | Google Scholar

Zhang, J., Feng, J. Y., Ni, Y. L., Wen, Y. J., Niu, Y., Tamba, C. L., et al. (2017). pLARmEB: integration of least angle regression with empirical bayes for multilocus genome-wide association studies. Heredity 118, 517–524. doi: 10.1038/hdy.2017.8

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y., Liu, J., Xia, X., He, Z. (2014). TaGS-D1, An ortholog of rice OsGS3, is associated with grain weight and grain length in common wheat. Mol. Breed 34, 1097–1107. doi: 10.1007/s11032-014-0102-7

CrossRef Full Text | Google Scholar

Zhang, Y., Liu, P., Zhang, X., Zheng, Q., Chen, M., Ge, F., et al. (2018). Multi-locus genome-wide association study reveals the genetic architecture of stalk lodging resistance-related traits in maize. Front. Plant Sci. 9. doi: 10.3389/fpls.2018.00611

CrossRef Full Text | Google Scholar

Zhang, J., Song, Q., Cregan, P. B., Nelson, R. L., Wang, X., Wu, J., et al. (2015). Genome-wide association study for flowering time, maturity dates and plant height in early maturing soybean (Glycine max) germplasm. BMC Genomics 16, 217. doi: 10.1186/s12864-015-1441-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, L., Zhao, Y.-L., Gao, L.-F., Zhao, G.-Y., Zhou, R.-H., Zhang, B.-S., et al. (2012). TaCKX6-D1, The ortholog of rice OsCKX2, is associated with grain weight in hexaploid wheat. New Phytol. 195, 574–584. doi: 10.1111/j.1469-8137.2012.04194.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, Q., Zhou, C., Zheng, W., Mason, A. S., Fan, S., Wu, C., et al. (2017). Genome-wide SNP markers based on SLAF-seq uncover breeding traces in rapeseed (Brassica napus L.). Front. Plant Sci. 8. doi: 10.3389/fpls.2017.00648

CrossRef Full Text | Google Scholar

Keywords: wheat, QTN, genome wide association studies, thousand grain weight, mrMLM

Citation: Kumari J, Lakhwani D, Jakhar P, Sharma S, Tiwari S, Mittal S, Avashthi H, Shekhawat N, Singh K, Mishra KK, Singh R, Yadav MC, Singh GP and Singh AK (2023) Association mapping reveals novel genes and genomic regions controlling grain size architecture in mini core accessions of Indian National Genebank wheat germplasm collection. Front. Plant Sci. 14:1148658. doi: 10.3389/fpls.2023.1148658

Received: 20 January 2023; Accepted: 11 April 2023;
Published: 28 June 2023.

Edited by:

Shouvik Das, Regional Centre for Biotechnology (RCB), India

Reviewed by:

Revathi Ponnuswamy, Indian Institute of Rice Research (ICAR), India
Kumar Paritosh, University of Delhi, India

Copyright © 2023 Kumari, Lakhwani, Jakhar, Sharma, Tiwari, Mittal, Avashthi, Shekhawat, Singh, Mishra, Singh, Yadav, Singh and Singh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Amit Kumar Singh, YW1pdC5zaW5naDVAaWNhci5nb3YuaW4=; Mahesh C. Yadav, bWFoZXNoLnlhZGF2MUBpY2FyLmdvdi5pbg==

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.