AUTHOR=Deprez Marie , Moreira Julien , Sermesant Maxime , Lorenzi Marco TITLE=Decoding Genetic Markers of Multiple Phenotypic Layers Through Biologically Constrained Genome-To-Phenome Bayesian Sparse Regression JOURNAL=Frontiers in Molecular Medicine VOLUME=2 YEAR=2022 URL=https://www.frontiersin.org/journals/molecular-medicine/articles/10.3389/fmmed.2022.830956 DOI=10.3389/fmmed.2022.830956 ISSN=2674-0095 ABSTRACT=

The applicability of multivariate approaches for the joint analysis of genomics and phenomics information is currently limited by the lack of scalability, and by the difficulty of interpreting the related findings from a biological perspective. To tackle these limitations, we present Bayesian Genome-to-Phenome Sparse Regression (G2PSR), a novel multivariate regression method based on sparse SNP-gene constraints. The statistical framework of G2PSR is based on a Bayesian neural network, were constraints on SNPs-genes associations are integrated by incorporating a priori knowledge linking variants to their respective genes, to then reconstruct the phenotypic data in the output layer. Interpretability is promoted by inducing sparsity on the genes through variational dropout, allowing to estimate the uncertainty associated with each gene, and related SNPs, in the reconstruction task. Ultimately, G2PSR is conceived to prevent multiple testing correction and to assess the combined effect of SNPs, thus increasing the statistical power in detecting genome-to-phenome associations. The effectiveness of G2PSR was demonstrated on synthetic and real data, with respect to state-of-the-art methods based on group-wise sparsity constraints. The application on real data consisted in an imaging-genetics analysis on the Alzheimer’s Disease Neuroimaging Initiative data, relating SNPs from more than 3,500 genes to clinical and multi-variate brain volumetric information. The experimental results show that our method can provide accurate selection of relevant genes in dataset with large SNPs-to-samples ratio, thus overcoming the main limitations of current genome-to-phenome association methods.