AUTHOR=Malmberg M. Michelle , Barbulescu Denise M. , Drayton Michelle C. , Shinozuka Maiko , Thakur Preeti , Ogaji Yvonne O. , Spangenberg German C. , Daetwyler Hans D. , Cogan Noel O. I. TITLE=Evaluation and Recommendations for Routine Genotyping Using Skim Whole Genome Re-sequencing in Canola JOURNAL=Frontiers in Plant Science VOLUME=9 YEAR=2018 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2018.01809 DOI=10.3389/fpls.2018.01809 ISSN=1664-462X ABSTRACT=

Whole genome sequencing offers genome wide, unbiased markers, and inexpensive library preparation. With the cost of sequencing decreasing rapidly, many plant genomes of modest size are amenable to skim whole genome resequencing (skim WGR). The use of skim WGR in diverse sample sets without the use of imputation was evaluated in silico in 149 canola samples representative of global diversity. Fastq files with an average of 10x coverage of the reference genome were used to generate skim samples representing 0.25x, 0.5x, 1x, 2x, 3x, 4x, and 5x sequencing coverage. Applying a pre-defined list of SNPs versus de novo SNP discovery was evaluated. As skim WGR is expected to result in some degree of insufficient allele sampling, all skim coverage levels were filtered at a range of minimum read depths from a relaxed minimum read depth of 2 to a stringent read depth of 5, resulting in 28 list-based SNP sets. As a broad recommendation, genotyping pre-defined SNPs between 1x and 2x coverage with relatively stringent depth filtering is appropriate for a diverse sample set of canola due to a balance between marker number, sufficient accuracy, and sequencing cost, but depends on the intended application. This was experimentally examined in two sample sets with different genetic backgrounds: 1x coverage of 1,590 individuals from 84 Australian spring type four-parent crosses aimed at maximizing diversity as well as one commercial F1 hybrid, and 2x coverage of 379 doubled haploids (DHs) derived from a subset of the four-parent crosses. To determine optimal coverage in a simpler genetic background, the DH sample sequence coverage was further down sampled in silico. The flexible and cost-effective nature of the protocol makes it highly applicable across a range of species and purposes.