AUTHOR=Choy Chi Tung , Wong Chi Hang , Chan Stephen Lam TITLE=Embedding of Genes Using Cancer Gene Expression Data: Biological Relevance and Potential Application on Biomarker Discovery JOURNAL=Frontiers in Genetics VOLUME=9 YEAR=2019 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2018.00682 DOI=10.3389/fgene.2018.00682 ISSN=1664-8021 ABSTRACT=
Artificial neural networks (ANNs) have been utilized for classification and prediction task with remarkable accuracy. However, its implications for unsupervised data mining using molecular data is under-explored. We found that embedding can extract biologically relevant information from The Cancer Genome Atlas (TCGA) gene expression dataset by learning a vector representation through gene co-occurrence. Ground truth relationship, such as cancer types of the input sample and semantic meaning of genes, were showed to retain in the resulting entity matrices. We also demonstrated the interpretability and usage of these matrices in shortlisting candidates from a long gene list as in the case of immunotherapy response. 73 related genes are singled out while the relatedness of 55 genes with immune checkpoint proteins (PD-1, PD-L1, and CTLA-4) are supported by literature. 16 novel genes (