Genome-Wide Association Study Reveals Candidate Genes for Flowering Time Variation in Common Bean (Phaseolus vulgaris L.)

Raggi, Lorenzo; Caproni, Leonardo; Carboni, Andrea; Negri, Valeria

doi:10.3389/fpls.2019.00962

ORIGINAL RESEARCH article

Front. Plant Sci., 24 July 2019

Sec. Plant Breeding

Volume 10 - 2019 | https://doi.org/10.3389/fpls.2019.00962

This article is part of the Research TopicLegumes for Global Food SecurityView all 37 articles

Genome-Wide Association Study Reveals Candidate Genes for Flowering Time Variation in Common Bean (Phaseolus vulgaris L.)

Lorenzo Raggi^1*†

Leonardo Caproni^1†

Andrea Carboni²

Valeria Negri¹

¹Dipartimento di Scienze Agrarie, Alimentari e Ambientali (DSA3), Università degli Studi di Perugia, Perugia, Italy
²CREA Research Centre for Cereal and Industrial Crops, Bologna, Italy

The common bean is one of the most important staples in many areas of the world. Extensive phenotypic and genetic characterization of unexplored bean germplasm are still needed to unlock the breeding potential of this crop. Dissecting genetic control of flowering time is of pivotal importance to foster common bean breeding and to develop new varieties able to adapt to changing climatic conditions. Indeed, flowering time strongly affects yield and plant adaptation ability. The aim of this study was to investigate the genetic control of days to flowering using a whole genome association approach on a panel of 192 highly homozygous common bean genotypes purposely developed from landraces using Single Seed Descent. The phenotypic characterization was carried out at two experimental sites throughout two growing seasons, using a randomized partially replicated experimental design. The same plant material was genotyped using double digest Restriction-site Associated DNA sequencing producing, after a strict quality control, a dataset of about 50 k Single Nucleotide Polymorphisms (SNPs). The Genome-Wide Association Study revealed significant and meaningful associations between days to flowering and several SNP markers; seven genes are proposed as the best candidates to explain the detected associations.

Introduction

Achieving food security is one of the most important challenges to face in the next three decades. FAO’s 2017 prospects’ revision on the world’s population growth reports an expected growth of the population of more than 2 billion people by 2050 (United Nations, 2017). Accordingly, the demand of food will increase, especially in the areas of the world where most of the developing countries are located, mainly in the African continent (Jensen et al., 2012).

In this context, grain legumes are generally regarded as key commodities for improving food security as they are a relatively inexpensive source of amino acids and other important nutrients such as minerals, when compared to livestock and dairy products (Jensen et al., 2012). In addition, due to their ability to fix atmospheric nitrogen, legumes can generally help reducing the use of fertilizers, thus the environmental impact of agriculture (Reay et al., 2012; Andrews and Andrews, 2017). For all these reasons the use of legumes as a key ingredient for a sustainable agricultural production system is at the core of agricultural policy debates in different countries (Zander et al., 2016).

Among grain legumes, common bean (Phaseolus vulgaris L., 2n = 2x = 22) is one of the most important staples in the world, produced over an area of 18 million hectares with a total production of 12 million tons per year (Akibode and Maredia, 2011; Faostat, 2019). Its production mainly occurs in the sub-Saharan Africa and in many Latin American countries (Petry et al., 2015), where it is critical to nutritional security and farmers income generation (Broughton et al., 2003). The cultivated common bean originated in two centers of diversity, giving rise to two genepools: the Mesoamerican, from Central America and the Andean, from the Andes mountains in South America. Many evidences demonstrated that the two genepools are the result of two independent domestication events that led to many morphological and genetic differences (Singh et al., 1991a, b; Kwak and Gepts, 2009).

Upon the introduction of the common bean in Europe from the Americas, hybridization of the two genepools generated further genetic diversity (Gepts et al., 1988; Zeven, 1997; Angioi et al., 2009; Gioia et al., 2013; Maras et al., 2013), for this reason Europe is considered a secondary center of diversification for this species (Angioi et al., 2010). This process led to the constitution of many European common bean landraces that represent a very important resource for plant breeding. In fact, they have been and still are a useful, sometimes unique, source of favorable alleles for abiotic stress, pest and disease resistances (Esquinas-Alcázar, 1993; Angioi et al., 2010).

Landraces are distinct and variable populations that are characterized by useful agronomical traits and adaptation to the specific environments where they were cultivated for a long time. It is important to stress that landraces differ from historical varieties; in fact, they lack “formal” crop improvement and are closely related to knowledge, habits and uses of the people that have been grown them until present times (Raggi et al., 2013). Even if landraces are excellent raw material for breeding new varieties, the within-population genetic diversity of such materials makes their exploitation in plant breeding challenging. This applies to common bean too where intra-landraces genetic diversity can be rather high (Tiranti and Negri, 2007) while intra-individual heterozygosity rather low (Caproni et al., 2018). Indeed, difficulties may arise in the attempt of associating phenotypic traits of interest with the corresponding genetic determinants when using landraces; the identification of such associations is a fundamental prerequisite for allele mining (Visioni et al., 2013). Therefore, the development of a panel of a manageable number of diverse homozygous common bean genotypes is needed to cope with the above-mentioned limitations (Pignone et al., 2015).

The Single Seed Descent (SSD), initially proposed as a modification of the classical bulk breeding scheme to overcome the problem of natural selection (Goulden, 1939), represent a cost-effective approach to achieve that purpose. Given a certain cross, the application of this method to segregating generations allows to maximize the level of retained genetic variation in relation to cost and labor. SSD consists of passing from a generation to the next one by sowing a single seed from each plant (Brim, 1966). In a self-pollinating species like common bean, SSD can be effectively exploited to generate highly homozygous genotypes starting from single individuals of different landraces (Snape and Riggs, 1975).

Although bi-parental mapping has been successful in identifying many significant Quantitative Trait Loci (QTL) mapped to wide intervals in the common bean genome, our knowledge of genes controlling certain traits is still limited (Johnson and Gepts, 2002; Kelly et al., 2003; Blair et al., 2006, 2011; Miklas et al., 2006; Kwak et al., 2008; Perez-Vega et al., 2010). In fact, the resolution of QTL analysis is generally limited by the number of the recombination events; it means that a QTL can span a few centiMorgans (cM), which can indeed be translated into relatively long physical distances, sometimes containing hundreds of candidate genes (Moghaddam et al., 2016). By contrast, Genome Wide Association Mapping (GWAM) considers much more recombination events by using an association panel of individuals, each of those potentially characterized by a unique recombination history (Visscher et al., 2017). In addition, Genome Wide Association Studies (GWAS), based on very high number of markers, allow to test association of the trait of interest with a large part of the genome of the target species. Due to low cost by data point, high robustness, reproducibility and number in the genome, molecular markers based on Single Nucleotide Polymorphism (SNP) detection are those of election for conducting GWAS.

Currently, different approaches can be used to generate large SNP datasets. For example, high-density SNP arrays are already available for several crops (Hao et al., 2017) including common bean. However, such arrays are often designed starting from a limited number of elite genotypes and can produce biased data when used for characterization of non-elite materials. Next Generation Sequencing (NGS) techniques, that equally produce high number of datapoints, are an interesting alternative as they allow cheap not-biased SNP discovery and genotyping. This approaches have already been used and proven efficient in several crops including wheat, barley and pea (Poland et al., 2012; Liu et al., 2014; Annicchiarico et al., 2017). Moreover, the current availability of reference genomes of several crops (that allows to perform in silico simulations to optimize the technique and to map the markers) and of collections of genetically diverse pure lines (that allow to reduce sequence coverage due to the absence of heterozygous loci) makes Next Generation Genotyping (NGG) extremely attractive. Among the possible different NGG strategies (Davey et al., 2011; Barilli et al., 2018) double digest Restriction-site Associated DNA sequencing (ddRAD-seq) was the one of choice for this work. ddRAD-seq is a technique based on a digestion of genomic DNA carried out using two restriction enzymes (instead of a single restriction enzyme as in RAD-seq); the resulting DNA fragments are then ligated to sample-specific barcode adapters for subsequent bulk genotyping on an Illumina platform (Peterson et al., 2012).

Schmutz et al. (2014) published the first reference genome for P. vulgaris. This achievement opened novel possibilities for common bean NGG making the use of techniques, such as ddRAD-seq, potentially very effective.

Recently, association studies were carried out on the common bean using different plant materials and genotyping approaches. These studies focused on the search of meaningful association of agronomic traits (Moghaddam et al., 2016), nitrogen fixation (Kamfwa et al., 2015a), resistance to diseases (Perseguini et al., 2016), seed weight (Yan et al., 2017), and some technological traits as cooking time in dry beans (Cichy et al., 2015) with possible genetic determinants involved in their control. In some cases, these studies allowed the identification of candidate genes that can be used to develop new genetic stocks for bean breeding programs.

As flowering time is a key trait determining the production of dry matter and seed yield in many crops such as common bean, its manipulation is a relevant plant breeding target to produce novel varieties that are better adapted to changing climatic conditions (Jung and Müller, 2009). For example, early flowering can be exploited to avoid harsh environmental conditions (e.g., drought and heat) and/or escape pathogen attacks that can both negatively affect seed production (as they occur during/after the seed set stage). On the other hand, late flowering can increase seed yield by extending the vegetative phase and increasing the photosynthate accumulation. A flowering time well-synchronized with target environmental conditions would contribute to the achievement of optimal crop performances.

Extensive studies on floral transition revealed a network of regulatory interactions among genes able to promote or inhibit the phenological transition to the reproductive phase (i.e., flowering). In Arabidopsis many of the regulatory genes have been identified and functionally characterized (Putterill et al., 2004; Bäurle and Dean, 2006). Moreover, different species, such as medicago (Medicago truncatula) (Pierre et al., 2008, 2011; Laurie et al., 2011), pea (Pisum sativum) (Lejeune-Hénaut et al., 2008) and narrow-leafed lupin (Lupinus angustifolius) (Ksiazkiewicz et al., 2016; Nelson et al., 2017) have been used to investigate the genetic control of flowering in legumes.

In P. vulgaris few studies on flowering time variation and control have been carried out to date. QTL mapping studies detected some genomic regions associated with the trait (Koinange et al., 1996; Blair et al., 2006; Perez-Vega et al., 2010). Raggi et al. (2014) found significant associations between some candidate genes and flowering time variation in a common bean collection. Recently, Kamfwa et al. (2015b) and Moghaddam et al. (2016) identified SNPs significantly associated with days to flowering.

In our study GWAS was used to detect key genomic regions involved in flowering time control. To the purpose, a panel of highly homozygous and diverse common bean genotypes was developed using SSD. Genotypes within the panel were subjected to an extensive genotyping, using ddRAD-seq, and phenotypic characterization carried out in different years and locations.

Materials and Methods

Plant Material

The plant material of this work was initially selected with the idea of creating a balanced collection of Andean and Mesoamerican landraces potentially representing an important portion of the European diversity of this species. A similar number of accessions from the two common bean genepools was initially considered; according to the available data of the phaseolin alleles, 97 Andean (57 T + 40 C type) and 84 Mesoamerican (all S type) accessions were included. Europe is the most represented geographical area in the panel (153 accessions) followed by South and Central America (22 and 17, respectively). Italy accounts the highest number of accessions followed by Turkey, Spain, Netherlands, and Portugal. A heatmap representing the origin of the materials is reported in Figure 1.

FIGURE 1

Figure 1. Heatmap of number of accessions per country included in the common bean panel.

Starting from the above described collection, 181 common bean highly homozygous genotypes (i.e., pure lines) were obtained applying SSD for at least 5 consecutive generations under isolated conditions. The 181 lines together with 11 cultivars, included as controls, constitute the diversity panel used in this study accounting a total of 192 lines (NCBI BioSample accessions from SAMN12035168 to SAMN12035359). Further details about lines within the panel, including the genebank from which each accession has been originally obtained, are reported in Supplementary Table 1.

Phenotyping

The phenological characterization of the 192 genotypes was carried out for two consecutive seasons (2016 and 2017) at: (i) DSA3-UNIPG experimental field located in Sant’Andrea d’Agliano, Perugia, Italy (43°3′15.12″N; 12°23′41.64″E, 175 m a.s.l.) (hereafter PG) and (ii) CREA-CI experimental field located in Anzola dell’Emilia, Bologna, Italy (44°34′30.51″N, 11°9′55.64″E, 38 m a.s.l.) (hereafter BO). In 2016 plant material was only evaluated in PG while in 2017 in both PG and BO. In both years sowing has been carried out in May: the 4th (PG_2016), the 11th (PG_2017), and the 12th (BO_2017).

The three experiments were all arranged using partially replicated randomized designs in which five entries were replicated five times and two were replicated six times, producing a total of 222 single plant samples out of 192 entries [total samples = 192 – 7 + (5 × 5) + (2 × 6)]. In PG, the 222 common bean samples were grown in 6 adjacent blocks (fixed size of 1 column × 37 rows) covered by anti-insect net; in BO the same samples were arranged in 3 adjacent blocks (fixed size of 1 column × 74 rows). Plants were grown in a net covered nursery supplied with an automatic drip-irrigation system throughout the entire duration of the trials (May to mid-October). For each sample days to flowering (dtf) was recorded as days between sowing and the opening of the first flower (Raggi et al., 2014); a value of 162 days was assigned to the genotypes that did not flower by the end of the experiments (Zhao et al., 2007).

Phenotypic Data Analyses

The row-column layout of the grown plants and their partial replication allowed for a bi-dimensional spatial analysis of dtf (Singh et al., 1997, 2003; Rollins et al., 2013; Raggi et al., 2017). To the purpose, “plot,” “row,” and “column” number was assigned to each sample according to its position. For each entry dtf Best Linear Unbiased Predictors (BLUPs) of the genotype effect were calculated in GenStat^® (Payne et al., 2011) using the most suitable spatial model determined for the row and column field layout as described by Singh et al. (2003). The procedure consists in gauging the spatial variability by nine applicable models accounting for the existence of different trends, fitting each model (according to the sample position, using the Restricted Maximum Likelihood method, REML), and choosing the best possible one using the Akaike Information Criterion (AIC) (Akaike, 1974). The variance components were used to estimate dtf broad-sense heritability $(H e_{B}^{2})$ , along with its standard error, on a plot basis as:

H e_{B}^{2} = \frac{σ_{g}^{2}}{σ_{p}^{2}} \times 100

where $σ_{p}^{2} = σ_{e}^{2} + σ_{g}^{2}$ (phenotypic variance), $σ_{g}^{2}$ = genotypic variance, and $σ_{e}^{2}$ = error variance.

Descriptive statistics and Pearson’s correlation coefficients among dtf BLUPs of the three trials were calculated using the R package “agricolae” (de Mendiburu, 2017); results were then visualized using “ggplot2” package (Wickham, 2009). BLUPs datasets were then used to perform GWAS.

DNA Extraction

Genomic DNA was isolated from young leaf tissues, collected from 15 days-old single seedlings, using the TissueLyser II (Qiagen) and the DNeasy 96 plant kit (Qiagen) according to the procedure provided by the manufacturer. DNA concentration and quality were estimated using UV-Vis spectrophotometry (NanoDrop 2000^TM, Thermo Fisher Scientific). DNA integrity was evaluated after 1% agarose gels (Euro Clone) stained with ethidium bromide electrophoresis. DNA samples were then diluted to 30 ng/μl for following genotyping.

Genotyping

A double digest Restriction-site Associated DNA sequencing (ddRAD-seq) approach was used for genotyping. The library preparation and the sequencing were carried out by IGAtech (Udine, Italy). Before starting the procedure, a further check of the DNA concertation was produced using a fluorimetric assay to further normalize and uniform the samples. The libraries were produced using a custom protocol (IGAtech), with minor modifications in respect to the one implemented by Peterson and colleagues (Peterson et al., 2012). In silico analysis was initially performed to select the best combination of restriction enzymes using the common bean reference genome v1.0 (Schmutz et al., 2014). Since the analysis indicated SphI and MboI as the best restriction enzymes combination to maximize the number of sequenced loci, they were used for DNA digestion. Digested DNA was purified with AMPureXP beads (Agencourt) and ligated to barcode adapters. Samples were than pooled on multiplexing batches and bead purified. For each pool, target fragment distribution was collected on BluePippin instrument (Sage Instruments Inc., Freedom, CA, United States). Gel eluted fraction was amplified with oligo primers that introduce TruSeq indexes and subsequently bead purified. The resulting libraries were than checked with both Qubit 2.0 Fluorimeter (Invitrogen, Carlsbad, CA, United States) and Bioanalyzer DNA assay (Agilent Technologies, Santa Clara, CA, United States). Libraries were processed with Illumina cBot for cluster generation on the flow cell, following the manufacturer’s instruction and sequenced with V4 chemistry pair end 125 bp mode on HiSeq2500 instrument (Illumina, San Diego, CA, United States).

Demultiplexing of raw Illumina sequences was performed using Stacks v 2.0 (Catchen et al., 2013) and subsequent alignment to the common bean reference genome v 1.0 (Schmutz et al., 2014) using BWA-MEM (Li and Durbin, 2009) with default parameters. Stacks v2.0 was also used to detect all the covered SNP loci from the aligned reads and to filter the detected loci using the population program (included in Stacks v2.0). In this last step, only loci that are represented in at least 75% of the population were retained.

SNP Quality Control

Several quality control steps were performed on the SNP dataset using PLINK v1.09 (Purcell et al., 2007) and TASSEL v 5.2 software (Bradbury et al., 2007). In particular: (i) SNP loci characterized by values of missingness higher than 10%, (ii) individuals with more than 10% missing loci, and (iii) markers with a Minor Allele Frequency (MAF) lower than 5% were filtered. Loci characterized by heterozygosity ≥2% were also discarded.

Detection of Population Structure and Cryptic Relatedness

The analyses of structure and cryptic relatedness of genotypes in the panel were carried using a reduced dataset where loci in strong Linkage Disequilibrium (LD) (r² ≥ 0.3) were removed. In order to detect the population stratification of the developed panel, a Bayesian clustering approach was used. The number of clusters was initially tested in STRUCTURE v.2.3.4 (Pritchard et al., 2000) assuming an admixture model for different number of clusters (K), ranging from 1 to 11. For each tested cluster 10 iterations were carried out resulting from a 30,000 burn-in period and a Markov Chain Monte Carlo (MCMC) of 30,000 iterations after burn-in. The effective number of clusters was than inferred using the Evanno test (Evanno et al., 2005) implemented in the on-line tool STRUCTURE HARVESTER (Earl and vonHoldt, 2012). According to the result, a new single run was performed at the designed K using 100,000 burn-in period and 200,000 MCMC. The resulting population Q-matrix was used to (i) generate the corresponding Q-plot using the software DiStruct (Rosenberg, 2003) and (ii) to correct the association analyses for the putative population structure. Moreover, a kinship matrix was generated using PLINK v. 1.19 and visualized as heatmap and dendrogram using the R package “ggplot2” (Wickham, 2009).

Genome-Wide Association Analysis

Marker-trait association analyses were performed using a Mixed Linear Model (MLM) implemented in TASSEL v 5.2 that includes corrections for both population structure (Q) and kinship (K). In fact, the use of such model was necessary as P. vulgaris is characterized by a strong genetic structure (Kwak and Gepts, 2009; Raggi et al., 2013). The three BLUP datasets were used as phenotype input matrix in a single association analysis.

The resulting p-values were then plotted, as –log₁₀(p) to produce a Manhattan plot using the R package “CMplots” (Yin, 2016). The correction for multiple-testing was carried out using the Bonferroni adjustment based on the estimated number of independent recombination blocks calculated using PLINK according to Gabriel et al. (2002). For the SNP markers that remained significant after the application of Bonferroni correction, possible candidate genes were identified based on proximity (maximum ± 100 kb) (Patishtan et al., 2018) and by browsing the P. vulgaris genome using the online tool Jbrowse on Phytozome v. 12.1 (Goodstein et al., 2012). In order to take advantage of the latest version of the common bean reference genome, sequences containing the significant SNP were positioned against the P. vulgaris reference version 2.1. Nucleotide sequences of putative candidate genes were translated into the corresponding proteins and used as queries against the Arabidopsis thaliana protein database (Araport11 protein sequences) using the online tool BLASTP (AA query, AA db) available at: https://www.arabidopsis.org/Blast/.

Linkage Disequilibrium

A raw estimation of LD decay was obtained dividing the size of common bean genome (bp) by the number of independent recombination blocks within the panel, calculated according to Gabriel et al. (2002). In order to ascertain whether significant SNPs, and their relative candidate genes, were located on the same recombination blocks, further LD analyses were carried out. In particular, LD patterns were studied within a window of ±1.5 Mb (centered on the significant marker). Such analysis was only performed for those SNPs located outside the identified candidate genes. Markers within the windows surrounding the associated SNPs were generated using PLINK v. 1.09 by sub-setting the whole SNP dataset obtained after QC and then paired with their corresponding p-values. Pairwise LD between markers within the windows (r²) were calculated using HaploView 4.2 (Barrett et al., 2005); the same software was also used to produce graphical representations of the results.

Results

Phenotyping

A total of 648 (97.3%) common bean samples were successfully characterized for dtf during the three experiments. In BO-2017, the spatial analysis was more efficient than the completely randomized design with a superior efficiency of the spatial model CrdL (Completely randomized design with linear trends along rows) of 22.4% over the Completely randomized design (Crd); the Crd was the best model for BLUPs calculation from data collected in PG-2016 and 2017. Summary statistics of BLUPs are reported in Table 1. As expected, dtf showed high broad sense heritability (He²_B) in all trials (Table 1). Distribution analysis showed consistent data dispersion and the existence of a number of late flowering genotypes (Figure 2A); on the other hand, no differences were observed when data were analyzed separately according to the genepool (Supplementary Figure 1). Simple linear regressions of dtf in pairwise comparisons between years (Figure 2B) and experimental sites (Figure 2C) revealed significant and high correlation in both cases with an R² values equal to 0.90 (P < 0.001) and 0.93 (P < 0.001), respectively. The full BLUPs dataset is available in Supplementary Table 2.

TABLE 1

Table 1. Summary statistics, broad sense heritability, and spatial models used for the estimation of days to flowering BLUPs of 192 common bean genotypes.

FIGURE 2

Figure 2. Box plot representation of days to flowering (dtf) BLUPs of 192 common bean genotypes (A). Correlation between dtf recorded in the same location and different years (i.e., PG-2016 vs. PG-2017) (B) and in the same year and different location (i.e., PG-2017 vs. BO-2017) (C).

Genotyping

The ddRAD-seq genotyping generated a dataset of 106,072 polymorphic loci, of those 99.3% (105,319) were mapped on the reference genome v.1.0 (Pvulgaris_218_v1.0.fasta) (Schmutz et al., 2014). The full genotyping dataset is available at: https://www.ebi.ac.uk/ena/data/view/PRJEB33063.

After quality control, no genotype was excluded and a dataset of 49,518 SNPs markers evenly distributed over the 11 common bean chromosomes was retained for association analyses. A graphical representation of SNPs’ distribution over the eleven bean chromosomes is reported in Figure 3.

FIGURE 3

Figure 3. SNP density within 1 Mb window size, different colors represent different density levels. In the Figure “Chr” refers to common bean chromosomes, where centromeric (black line) and pericentromeric regions (thinner bar sections) are reported according to Schmutz et al. (2014).

Genetic Structure and Cryptic Relatedness

After removing SNP markers in strong LD (r² ≥ 0.3) a dataset of 2,518 SNP was generated and used to perform STRUCTURE and cryptic relatedness analyses (Supplementary Figure 2). Results of the Evanno test clearly indicated K = 2 as the most suitable level of population subdivision to explain the genetic structure of the studied panel. STRUCTURE group attributions were strongly consistent with the two common bean genepools (i.e., Mesoamerican and Andean). A graphic representation of the genetic structure of the panel is reported in Figure 4. Considering a threshold of q ≥ 0.8 (Bitocchi et al., 2012; Klaedtke et al., 2017), 11 out of 192 genotypes resulted product of admixture between the two genetic groups. All the eleven admixed genotypes derived from European accessions (153) indicating a level of hybridization, between Andean and Mesoamerican genepools, equal to 7.2% (11 out of 153). The admixed entries derived from 9 landrace accessions (Pv_072, Pv_073, Pv_077, Pv_086, Pv_092, Pv_128, Pv_131, Pv_134, and Pv_190) and 2 cultivars (Pv_059 and Pv_064).

FIGURE 4

Figure 4. Genetic structure and relatedness among 192 genotypes of common bean. Genetic structure (right and bottom side). Each genotype is represented by a column divided into two colored segments (yellow and magenta) whose length indicates the proportions of the genome attributed to each of the two main clusters. Cryptic relatedness (center). Heatmap of pairwise similarities between all the genotypes: red, white, and blue for low, medium, and high similarities, respectively. Hierarchical clustering of the panel. The two main groups are indicated with “A” and “M” letters standing for Andean and Mesoamerican, respectively (top and left side).

Results of cryptic relatedness are also graphically presented in Figure 4. According to genotype origins, inferred using the available information about phaseolins, the blueish square, bottom-left part of the heatmap, includes most of the genotypes of Mesoamerican origin (72). The plot also indicates a further possible sub-structure of the Mesoamerican genotypes into 3 subgroups, the largest of which includes about 50% of all the Mesoamerican samples (Figure 4). On the other hand, the bluish square, top-right part of the heatmap, groups the Andean genotypes (94) with very few exceptions. In this case too, a further subdivision of the group is evident but only one sub-group is clearly distinct (Figure 4). The 11 admixed genotypes are grouped right in the middle of the heatmap; they are characterized by average relatedness values in regard to all other genotypes (light blue, light red, or white color).

Genome-Wide Association Analysis

Across all the trials, high and consistent He $_{B}^{2}$ values were observed for dtf confirming the suitability of the trait to perform GWAS. The Bonferroni correction for multiple testing, calculated considering the number of independent recombination blocks (2,443), resulted in a threshold equal to 5.4 (–log₁₀(p)). GWAS results showed that multiple regions are associated with dtf in the common bean genome (Table 2). The lowest p-value (i.e., the strongest association) was obtained for SNP 123164_60 on chromosome Pv08. Significant associations were also found for SNPs 66929_307, 17455_7, 95297_22, 59746_63, 59746_36, 116028_71, and 17777_7. In total, 8 significant SNPs for dtf were identified in 4 different common bean chromosomes: Pv01, Pv04, Pv06, and Pv08 (Table 2). The Manhattan plot of GWAS results, based on the 49,518 SNP markers, is reported in Figure 5.

TABLE 2

Table 2. List of the significant SNPs identified in the study including physical position, association level, phenotypic variation explained by the SNP and MAF.

FIGURE 5

Figure 5. Manhattan and quantile–quantile (Q–Q) plots of dtf; In the Manhattan plot, SNPs are ordered by physical position and grouped by chromosome; unmapped SNPs are grouped in “u.” Under each chromosome information on SNP density within 1 Mb window size is also given. The blue dashed line indicates the genome-wide significance threshold. SNPs associated after Bonferroni correction are highlighted in red.

Candidate Gene Identification

The search of possible candidate genes for the most meaningful identified SNP, carried out using Phytozome (v. 12.2) and TAIR, resulted on the identification of 7 possible candidates.

When aligned to the P. vulgaris reference genome, the sequenced fragment containing SNP 123164_60 produced multiple hits making the discovery of an associated candidate gene rather complex. However, our analysis detected a relevant gene, Phvul.008G149900, located 100 kb upstream of a highly significant hit. Even if no functional annotation was found on the common bean reference genome for this gene, it is noteworthy that its encoded protein is highly similar to Arabidopsis At3G12810. This protein, similar to ATP dependent chromatin-remodeling proteins of the ISWI family, is encoded by Photoperiod-Independent Early flowering 1 (PIE1) that is involved in multiple flowering pathways.

Located only 50 kb downstream of the SNP 66929_307, on Pv04, Phvul.004G112100 is our best candidate to explain the phenotypic variation associated with this marker. The gene encodes for a NAD(P)-binding Rossmann-fold superfamily protein, carrying out oxidoreductase activity in the chloroplast.

The search of the best candidate gene for marker SNP 17455_7 resulted in the identification of Phvul.001G227200. In this case, the SNP is located in the first intron of the gene. Phvul.001G227200 is homologous of the A. thaliana At1G56260, also known as Meristem Disorganization 1 (MD1) that is required for the maintenance of stem cells through a reduction in DNA damage (TAIR, 2019b). Phvul.001G236000 is the best candidate to explain the phenotypic variation associated with the second peak observed in the same region on Pv01 (i.e., SNP 17777_7) as displayed in Figure 6A. The gene, located only 10 kb upstream of the SNP, encodes for a protein phosphatase 2C 3-Receptor, involved in abscisic acid signal transduction.

FIGURE 6

Figure 6. Manhattan plot of dtf and LD heatmap over a chromosome region of ±1.5 Mb surrounding SNPs: (A) 17777_7; (B) 59746_36, 59746_63; (C) 666929_307; (D) 116028_71. SNPs associated after Bonferroni correction are highlighted in red; candidate genes are placed according to their physical position. In the LD heatmap colors are coded according to the r² color key.

Phvul.006G215800 resulted as the best candidate to explain the effect of the SNP 95297_22 detected on Pv06. The gene encodes for Potassium Channel AKT2/3.

The two significant SNPs detected on Pv04 (SNP 59746_36 and 59746_63) co-localize on the same chromosome region being separated by 27 bp only. Located 40 kb upstream of the signal, Phvul.004G085100 is the best candidate gene explaining the effect of the markers. The homolog gene in Arabidopsis encodes for a sucrose transporter protein: AtSUC2.

Finally, we identified Phvul.008G055400 as the most meaningful candidate gene associated to the SNP 116028_71. The gene encodes for a Leucine-Rich Repeat Receptor-Like Protein, also known as Clavata2 (CLV2).

Linkage Disequilibrium

In the studied panel, LD decays in an average distance of circa 240 kb. According to the results of LD analysis, SNP 17777_7 and the candidate Phvul.001G236000 are in the same recombination block showing the association between the marker and the identified gene (Figure 6A). In the same figure section SNP 17455_7 is also displayed due to its position near to SNP 17777_7. In this case LD analysis was not necessary as the marker is physically located in the first intron of the corresponding candidate (Phvul.001G227200); it is noteworthy that several recombination events occurred between the two markers. A clear association was also observed for SNPs 59746_36 and 59746_63 with Phvul.004G085100 (Figure 6B) and for SNP 66929_307 with Phvul.004G112100 (Figure 6C). A fairly high average r² value was observed between Phvul.008G055400 recombination block and the SNP 116028_71 (Figure 6D).

Discussion

The SSD strategy used in this study allowed to produce a panel of highly homozygous common bean genotypes starting from 179 different landraces each of which putatively characterized by relatively high levels of diversity (Tiranti and Negri, 2007; Negri and Tiranti, 2010). Indeed, molecular data demonstrated that the genotypes in our panel are genetically uniform with a very low level of heterozygosity. At the same time, the panel retained a high level of among-genotypes diversity due to the different origin of the initially selected landraces (Figure 4). This approach allowed to build a panel of common bean pure lines that can be indefinitely used for association analyses on a plethora of traits of interest for both basic biology studies as well as for plant breeding. Sample seeds of each developed pure lines are currently conserved, using long-term storage conditions, in the genebank held by DSA3 (FAO code: ITA-363).

Results of the phenotypic characterization for dtf showed a rather high level of diversity within the panel (Rodiño et al., 2003; Raggi et al., 2014; Rana et al., 2015). Results of the partially replicated experimental design indicated high levels of dtf He $_{B}^{2}$ that is a crucial parameter to find meaningful and promising associations. Indeed, such design has been already used on barley for association analysis on yield performance (Al-Abdallat et al., 2017). In our study, the use of such experimental design also allowed to test, and to possibly correct, the existence of any bias related to the sample position within the experimental plots such as soil fertility and light exposure. As expected, Crd was the best model for BLUP calculation in two out of three cases since biases were not detected. It is also noteworthy that this particular experimental design allowed to maximize the number of phenotypic datapoints and, at the same time, reducing costs and space needed to characterize such a collection of germplasm.

Among different methods that can be used to generate SNP datasets, the selection of ddRAD-seq approach resulted in a very high number of SNPs evenly distributed over the eleven common bean chromosomes (Figure 3). Regarding the ddRAD-seq used protocol, the in silico digestion of the common bean reference genome allowed to select the best enzyme combination maximizing the number of sequenced loci. It is noteworthy that in our study, ddRAD-seq overcame available common bean SNP chips in terms of number of markers successfully genotyped (Cichy et al., 2015; Kamfwa et al., 2015a, b; Moghaddam et al., 2016). In addition, we believe that the technique used for genotyping can help in reducing the ascertainment bias deriving from the use of chip arrays for genotyping of non-elite material.

GWAS is a powerful tool to dissect the genetic control of quantitative traits, potentially providing a higher resolution than QTL mapping. Therefore, in recent years, the interest on this approach arose in both academic and commercial sectors (Davey et al., 2011). In our study, association analysis allowed to detect eight significant SNPs associated with dtf on four common bean chromosomes: Pv01, Pv04, Pv06, and Pv08. The analysis of the genomic regions surrounding the detected SNPs allowed the identification of seven meaningful candidate genes that could have an important role in controlling the studied trait. In previous studies, QTL for dtf on common bean chromosome Pv01 has been widely reported (Blair et al., 2006; Perez-Vega et al., 2010; Mukeshimana et al., 2014). Moreover, recent research based on GWAS further confirmed the presence of genomic regions involved in the control of this trait on the same chromosome (Kamfwa et al., 2015b; Moghaddam et al., 2016). Similarly, QTL on Pv08 was already reported (Koinange et al., 1996; Perez-Vega et al., 2010) and also confirmed in the above-mentioned studies based on GWAS. According to the mentioned bibliographic records and data produced in our study, the associations on Pv01 and Pv08 are likely to be stable across different environments and genetic background (Kamfwa et al., 2015b). In addition, we observed significant associations on chromosomes Pv04 that were reported by a QTL mapping (Mukeshimana et al., 2014) and GWAS (Moghaddam et al., 2016) study. Finally, Blair et al. (2006) indicated the presence of a QTL for dtf on common bean chromosome Pv06 that was also detected in our study.

The first candidate gene identified in this study, Phvul.008G149900, encodes for a protein that is highly similar to PIE1. It is noteworthy that mutations of PIE1 in A. thaliana resulted in the suppression of Flowering Locus C-mediated delay of flowering and causes early flowering even during non-inductive photoperiods (Noh and Amasino, 2003).

Phvul.004G112100 resulted the best candidate to explain the phenotypic variation associated with SNP 66929_307. In A. thaliana mutants for the homologous gene (At4G23430) showed an early-flowering phenotype (TAIR, 2019d) suggesting a role for Phvul.004G112100 in common bean flowering time control. Mapping homologous Arabidopsis sequences for photoperiod sensitivity in common bean, Kwak et al. (2008) located one of the homolog of Terminal Flower1 (PvTFLx) on chromosome Pv04. In particular, PvTFLx is located 2 Mb downstream our best candidate suggesting that this region harbors different genes involved in flowering time control.

Phvul.001G227200, homolog of MD1 in Arabidopsis, was the resulting candidate to explain the significance of the SNP 17455_7. Interestingly, in A. thaliana mutants of this gene showed several development defects such as abnormal phyllotaxy and plastochron, stem fasciation and reduced root growth (Hashimura and Ueguchi, 2011). In the same study the authors reported that in mutants “leaves and floral buds did not develop in a spatially and temporally regulated manner” opening for a possible role of the gene in flowering control. It is also noteworthy that, in maize, shoot apical meristem development has been associated with flowering time (Leiboff et al., 2015). Further analyses, within the same chromosomic region of the previous candidate, revealed that Phvul.001G236000 is the best candidate to explain the phenotypic variation of the marker 17777_7. In A. thaliana mutants of the homologous At4G26080 showed a late flowering phenotype. It is noteworthy that this narrow chromosomic region (circa 1.3 Mb) contains the two candidate genes detected in this study on Pv01 together with Phvul.001G221100 that encodes for Phytocrome A (Kamfwa et al., 2015b). Finally, this chromosomic region overlaps with a QTL for days to flowering identified by Blair et al. (2006) using a by-parental mapping approach: one of the two flanking markers of the QTL falls in the same above-mentioned chromosomic region. All these experimental evidences suggest the presence of a gene cluster involved in flowering time control in chromosome Pv01.

Phvul.006G215800, the best proposed candidate for the marker 9529_7 on chromosome Pv06, encodes for Potassium Channel AKT2/3, a photosynthate and light-dependent inward rectifying potassium channel with unique gating proprieties that are regulated by phosphorylation (TAIR, 2019e). It has been demonstrated that loss of function of AKT2/3 affects sugar loading into the phloem of A. thaliana and mutants show delayed flower induction and rosette development (Deeken et al., 2002). Such phenotype strongly corroborates our hypothesis of the involvement of Phvul.006G215800 in flowering.

Phvul.004G085100 on the chromosome Pv04 encodes for a sucrose transport protein. Interestingly, in A. thaliana, a mutation of the homolog (At1G22710) causes dwarfism, delayed development and it has been reported that such plants can occasionally flower, but never produce viable seeds (TAIR, 2019a). At1G22710, also known as AtSUC2, was one of the first genes associated with sucrose transporters (Sauer and Stolz, 1994); this gene is required for phloem loading of sucrose and its activity has been described in detail by Chandran et al. (2003). It is well known that sucrose is a relevant element within the flowering induction process (Corbesier et al., 1998); in plants, sucrose is the main form of fixed carbon that is transported in phloem and also serves as specific signaling molecule (Teng et al., 2005; Solfanelli et al., 2006). An increase in carbohydrate export from leaves has been generally associated with floral induction in Arabidopsis (Corbesier et al., 1998). Consistently, in Nicotiana tabacum L., a decreased phloem loading of sucrose, induced by antisense repression of the NtSUT1 causes delayed flowering (Burkle et al., 1998). Moreover, in A. thaliana sucrose availability on the aerial part of the plant promotes flowering even in dark conditions (Roldán et al., 1999). All these evidences strongly suggest a role of Phvul.004G085100 in controlling flowering time in P. vulgaris. In addition, it is also relevant to mention that Mukeshimana et al. (2014) found two QTLs for days to flowering on the mid-terminal part of Pv04 that might roughly correspond to the chromosome region in which Phvul.004G085100 is located. However, since in above-mentioned study SNP marker positions are expressed as cM, it is difficult to ascertain whether our candidate falls within these regions.

In conclusion, Phvul.008G055400, the candidate gene identified in relation the marker 116028_71 on the chromosome Pv08, encodes for CLV2. In A. thaliana a mutation of the homolog of this gene (At1G65380) causes altered flower development, late flowering or interrupted flowering caused by a temporary termination of the main inflorescence flower meristem (TAIR, 2019c). In a recent paper, Basu et al. (2019) identified four Clavata genes, including Clavata2, that are highly associated with days to flowering in Cicer arietinum L.

As from the discussed evidences and related bibliographic records, homolog of most of the genes that we propose as candidates for explaining observed flowering time variation in the studied common bean panel are involved in different pathways regulating flowering in Arabidopsis, tobacco, maize, and chickpea. Indeed, results of this research are an important step forward in understanding flowering time control in one of the most important pulses world-wide. Although this diversity panel is representative of a large portion of the European common bean diversity, performing similar analyses on a wider and/or more diverse panel would help in confirming the detected associations. In addition, the application of gene knockout to the proposed candidates would further confirm their involvement in the genetic control of flowering time and allow to measure their contribution to its expression under different experimental conditions (e.g., short vs. long day treatments). The exploitation of the genes identified in this research will hopefully allow the development of new common bean varieties able to better adapt to changing climatic conditions.

Data Availability

NCBI BioSample accessions of the 192 common bean lines from: SAMN12035168 to SAMN12035359. The full genotyping dataset of the same samples is available at: https://www.ebi.ac.uk/ena/data/view/PRJEB33063.

Author Contributions

VN and LR conceived and designed the experiments. LC and AC performed plant phenotyping. LR, LC, and AC performed phenotypic data analyses. LR and LC performed GWAS and candidate gene data analyses. AC and VN contributed to phenotyping costs. VN funded reagents, materials, and analysis tools. All authors wrote the manuscript.

Funding

This research was partially funded by “Novel characterization of crop wild relative and landrace resources as a basis for improved crop breeding” (PGR Secure) project, European Community’s Seventh Framework Program [GA n. 266394], “Strategies for Organic and Low input Integrated Breeding and Management” (SOLIBAM) project, European Community’s Seventh Framework Program [GA n. 245058], “Networking, partnerships and tools to enhance in situ conservation of European plant genetic resources” (Farmer’s Pride) project, European Community H2020 Framework Program [GA n.774271], and “Progetto per l’attuazione delle attività contenute nel programma triennale 2017–2019 per la conservazione, caratterizzazione, uso e valorizzazione delle risorse genetiche vegetali per l’alimentazione e l’agricoltura” (RGV-FAO project) [DM N.0011746].

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This work is an integral part of LC Ph.D. thesis, carried out under the supervision of VN and LR. The authors wish to acknowledge Professor S. Ceccarelli for his contribution in carrying out spatial analyses, Dr. Thâmara Figueiredo Menezes Cavalcanti for her contribution to the phenotyping activities and all the field technicians of the DSA3-UNIPG (Sant’Andrea d’Agliano) and CREA-CI (Anzola) for managing the trials.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2019.00962/full#supplementary-material

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723. doi: 10.1109/TAC.1974.1100705