With the advent of rapidly increasing microarray and sequencing technology, high-throughput omics data (e.g., SNPs, RNA expression and DNA methylation) can be generated from multiple platforms and used to study the biological mechanisms underlying complex diseases at different levels. For example, genome-wide association studies (GWASs) have been widely used to identify common SNPs associated with complex human diseases. In a typical GWAS, large numbers SNPs are genotyped on hundreds or thousands of subjects, and each SNP is subsequently tested, one by one, for association with the phenotype of interest. Similarly, transcriptome-wide association studies (TWASs) and epigenome-wide association studies (EWASs) can be conducted by using the traditional single variant tests. However, such single variant tests have limited power, especially after adjusting for multiple testing penalties. In these association studies, continuous and binary phenotypes are usually used, but more complex phenotypes are collected by many studies, such as multiple correlated, familial, longitudinal and survival phenotypes, which pose statistical and computational challenges to the association studies.
Besides, the association between a complex disease and one type of omics data is usually tested individually, but this strategy is suboptimal and has some disadvantages. Researchers often find that only a small proportion of disease variation can be explained by one type of omics (e.g., genetic) data, leading to “missing heritability”. Moreover, molecular variants identified by different studies usually suffer from poor reproducibility. Most importantly, only partial information is used for each analysis. Therefore, to better characterize biological processes and increase the consistency of variant identification, omics data from separate platforms need to be integrated and analyzed. Integrating information from different biological datasets has the potential to yield better insight into causal mechanisms of complex diseases than that from individual omics datasets.
Moreover, although the analyses of both the single type of omics data and integrative multi-omics data can help us identify the susceptibility genes, the mechanistic pathways among the omics data, clinical variables and environmental variables could be still missing. Therefore, novel approaches for causal relationship analysis and further causal mediation analysis for omics data are needed.
This research topic aims to document sophisticated statistical approaches for omics data analysis including mini-reviews, full-length reviews, and original research manuscripts on but not limited to the following areas:
• Novel statistical approaches for omics data association studies
• Novel statistical approaches for omics data with complex phenotypes
• Novel statistical approaches for multi-omics data analysis in an integrative manner
• Novel statistical approaches for causal inference using omics data
• Novel statistical approaches for mediation analysis using omics data
With the advent of rapidly increasing microarray and sequencing technology, high-throughput omics data (e.g., SNPs, RNA expression and DNA methylation) can be generated from multiple platforms and used to study the biological mechanisms underlying complex diseases at different levels. For example, genome-wide association studies (GWASs) have been widely used to identify common SNPs associated with complex human diseases. In a typical GWAS, large numbers SNPs are genotyped on hundreds or thousands of subjects, and each SNP is subsequently tested, one by one, for association with the phenotype of interest. Similarly, transcriptome-wide association studies (TWASs) and epigenome-wide association studies (EWASs) can be conducted by using the traditional single variant tests. However, such single variant tests have limited power, especially after adjusting for multiple testing penalties. In these association studies, continuous and binary phenotypes are usually used, but more complex phenotypes are collected by many studies, such as multiple correlated, familial, longitudinal and survival phenotypes, which pose statistical and computational challenges to the association studies.
Besides, the association between a complex disease and one type of omics data is usually tested individually, but this strategy is suboptimal and has some disadvantages. Researchers often find that only a small proportion of disease variation can be explained by one type of omics (e.g., genetic) data, leading to “missing heritability”. Moreover, molecular variants identified by different studies usually suffer from poor reproducibility. Most importantly, only partial information is used for each analysis. Therefore, to better characterize biological processes and increase the consistency of variant identification, omics data from separate platforms need to be integrated and analyzed. Integrating information from different biological datasets has the potential to yield better insight into causal mechanisms of complex diseases than that from individual omics datasets.
Moreover, although the analyses of both the single type of omics data and integrative multi-omics data can help us identify the susceptibility genes, the mechanistic pathways among the omics data, clinical variables and environmental variables could be still missing. Therefore, novel approaches for causal relationship analysis and further causal mediation analysis for omics data are needed.
This research topic aims to document sophisticated statistical approaches for omics data analysis including mini-reviews, full-length reviews, and original research manuscripts on but not limited to the following areas:
• Novel statistical approaches for omics data association studies
• Novel statistical approaches for omics data with complex phenotypes
• Novel statistical approaches for multi-omics data analysis in an integrative manner
• Novel statistical approaches for causal inference using omics data
• Novel statistical approaches for mediation analysis using omics data