AUTHOR=Pollard Rebecca D. , Wilkerson Matthew D. , Rajagopal Padma Sheila TITLE=Identification of germline population variants misclassified as cancer-associated somatic variants JOURNAL=Frontiers in Medicine VOLUME=11 YEAR=2024 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2024.1361317 DOI=10.3389/fmed.2024.1361317 ISSN=2296-858X ABSTRACT=Introduction

Databases used for clinical interpretation in oncology rely on genetic data derived primarily from patients of European ancestry, leading to biases in cancer genetics research and clinical practice. One practical issue that arises in this context is the potential misclassification of multi-ancestral population variants as tumor-associated because they are not represented in reference genomes against which tumor sequencing data is aligned.

Methods

To systematically find misclassified variants, we compared somatic variants in census genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) V99 with multi-ancestral population variants from the Genome Aggregation Databases’ Linkage Disequilibrium (GnomAD). By comparing genomic coordinates, reference, and alternate alleles, we could identify misclassified variants in genes associated with cancer.

Results

We found 192 of 208 genes in COSMIC’s cancer-associated census genes (92.31%) to be associated with variant misclassifications. Among the 1,906,732 variants in COSMIC, 6,957 variants (0.36%) aligned with normal population variants in GnomAD, concerning for misclassification. The African / African American ancestral population included the greatest number of misclassified variants and also had the greatest number of unique misclassified variants.

Conclusion

The direct, systematic comparison of variants from COSMIC for co-occurrence in GnomAD supports a more accurate interpretation of tumor sequencing data and reduces bias related to genomic ancestry.