AUTHOR=Van Poelvoorde Laura A. E. , Delcourt Thomas , Coucke Wim , Herman Philippe , De Keersmaecker Sigrid C. J. , Saelens Xavier , Roosens Nancy H. C. , Vanneste Kevin TITLE=Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing JOURNAL=Frontiers in Microbiology VOLUME=Volume 12 - 2021 YEAR=2021 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2021.747458 DOI=10.3389/fmicb.2021.747458 ISSN=1664-302X ABSTRACT=The ongoing COVID-19 pandemic constitutes a tremendous global health issue. Continuous monitoring of the virus has become a cornerstone to make rational decisions on implementing societal and sanitary measures to curtail the spread. Additionally, emerging variants have increased the need for genomic surveillance to detect particular strains because of their potentially increased transmissibility, pathogenicity and immune escape. Targeted SARS-CoV-2 sequencing of clinical and wastewater samples has been explored as an epidemiological surveillance method for the competent authorities. Currently, only the consensus genome sequence of the most abundant strain is taken into consideration for analysis, but multiple variant strains are now circulating in the population. Consequently, in clinical samples, potential coinfection(s) by several different variants can occur or quasispecies can develop during an infection in an individual. In wastewater samples, multiple variant strains will often be simultaneously present. Presently, quality criteria are mainly available for constructing the consensus genome sequence, and some guidelines exist for the detection of coinfections and quasispecies in clinical samples. The performance of detection and quantification of low-frequency variants (LFV) using whole-genome sequencing (WGS) of SARS-CoV-2 remains largely unknown. Here, we evaluated the detection and quantification of mutations present at low abundances using mutations defining the SARS-CoV-2 lineage B.1.1.7 as a case study. Real sequencing data were in silico modified by introducing mutations of interest into raw wild-type sequencing data, or by mixing wild-type and mutant raw sequencing data, to construct mixed samples subjected to WGS using a tiling amplicon-based targeted metagenomics approach and Illumina sequencing. As anticipated, higher variation and lower sensitivity were observed at lower coverages and allelic frequencies. We found that detection of all LFV at an abundance of 10%-5%-3%-1%, requires at least a sequencing coverage of 250X-500X-1500X-10,000X, respectively. Although increasing variability of estimated allelic frequencies at decreasing coverages and lower allelic frequencies was observed, its impact on reliable quantification was limited. This study provides a highly sensitive LFV detection approach (https://galaxy.sciensano.be), and specific recommendations for minimum coverages to detect clade-defining mutations at certain allelic frequencies. This approach will be useful to detect and quantify LFV in both clinical and wastewater samples.