AUTHOR=Li Chang , Ma Kevin , Xu Nicole , Fu Chenjian , He Andrew , Liu Xiaoming , Bai Yongsheng 

TITLE=SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for Next-Generation Sequencing data

JOURNAL=Frontiers in Artificial Intelligence

VOLUME=Volume 5 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2022.991733

DOI=10.3389/frai.2022.991733

ISSN=2624-8212

ABSTRACT=Currently, there are many publicly available NGS tools developed for variant annotation and classification. However, as modern sequencing technology produces more and more sequencing data, a more efficient analysis program is desired, especially for variant analysis. In this study, we updated SNPAAMapper, a variant annotation pipeline by converting Perl codes to Python for generating annotation output with an improved computational efficiency and updated information for broader applicability. The new pipeline written in Python can classify variants by region (CDS, UTRs, upstream, downstream, intron), predict amino acid change type (missense, nonsense, etc.), and prioritize mutation effects (e.g., non-Synonymous > Synonymous) while being faster and more efficient. Our new pipeline works in five steps. First, exon annotation files are generated. Next, the exon annotation files are processed, and gene mapping and feature information files are produced. Afterward, the Python scrips classify the variants based on genomic regions and predict the amino acid change category. Lastly, another Python script prioritizes and ranks the mutation effects of variants to output the result file.