AUTHOR=Deng Chao , Peng Wenzhu , Ma Zhi , Ke Caihuan , You Weiwei , Wang Ying TITLE=AquaGWAS: A Genome-Wide Association Study Pipeline for Aquatic Animals and Its Application to Reference-Required and Reference-Free Genome-Wide Association Study for Abalone JOURNAL=Frontiers in Marine Science VOLUME=9 YEAR=2022 URL=https://www.frontiersin.org/journals/marine-science/articles/10.3389/fmars.2022.841561 DOI=10.3389/fmars.2022.841561 ISSN=2296-7745 ABSTRACT=

Aquaculture is a rapidly growing industry that brings huge economic benefits. Genome-wide association study (GWAS) is critical for aquaculture species’ productivity, sustainability, and product quality. The current integrated GWAS pipeline either includes only specific limited steps or requires a complex prerequisite environment and configurations. In this study, we developed AquaGWAS, a highly user-friendly graphical user interface (GUI) GWAS pipeline, by integrating four well-known GWAS models. AquaGWAS is a complete GWAS pipeline from preprocessing, multiple choice of GWAS models, postprocessing to visualizations. AquaGWAS offers GUI easy running on Linux and automatically generates running command lines for high-performance computing (HPC) or non-GUI servers. AquaGWAS is free from installation, configurations, and complicated augment inputs. It offers whole packages of required reference files for 27 common aquatic species. Furthermore, aiming at the issue that the availability of genomic reference sequences limits single-nucleotide polymorphism (SNP) detection, we attempted to detect SNPs in Pacific abalone using classical alignment-based reference-required strategy and k-mer-based reference-free strategy combined with downstream AquaGWAS. On 222 resequencing data of Pacific abalone, two strategies detected 221,061 and 230,213 variants, respectively, with 180,161 common variants. The two strategies emphasized different variant situations: capturing variants missed by incomplete or inaccurate reference genomic sequence (k-mer-based) and capturing the indel variants having the baseline of genomic sequence (alignment-based). Combining the two strategies offers a complementary framework to obtain the accurate and complete GWAS analysis for non-model organism species. AquaGWAS is available at https://github.com/Ying-Lab/AquaGWAS.