With the coming of the single-cell RNA sequencing era, many bulk high-throughout technologies such as microarray and RNA-Seq have become old-fashion and conventional. Nevertheless, the majority of data in publicly-accessible databases (e.g., GEO, ArrayExpress, and TCGA) are still bulk type, and tremendous invaluable biological information remains hidden beneath them. Advanced machine learning methods such as deep neural networks facilitate the exploration and pattern discovery of single-cell RNA-Seq data. Conversely, these methods are under-utilized or rarely developed for bulk data, possibly due to the fact that single-cell RNA-Seq data are regarded as “big-big data”, referring to that both the numbers of cells and genes are large, and therefore more suitable for such methods. Nevertheless, with additional considerations for the issues of small sample size and biological interpretation, e.g., feature selection or data augmentation, the adoption of deep learning methods to conventional omics data is certainly promising.
To address the underutilization of advanced artificial intelligence methods in classic bulk data, we propose this specific research topic, which focuses to boost the development and application of relevant novel machine learning methods to analyze conventional omics data, including microarray, DNA Bisulfite-Seq, RNA-Seq, Chi-Seq data and so on for pattern discovery and information mining. Here, the machine learning methods include, but not limited to convolution neural networks, recurrent neural networks, Bert-Technology, graph methods, and Bayesian networks. We believe that developing novel machine learning methods or adopting existing algorithms, especially deep learning methods to analyze the bulk omics data can help to dig out insightful biological information embedded by them and make them shine again.
Themes include:
1) The development and application of deep learning methods, especially for longitudinal bulk data;
2) Development or reutilization of advanced methods (such as deep graph network) to explore the interactions between features and to carry out feature selection, thus resulting in better biological interpretation;
3) Specific methods potentially make causal inference possible and their application, and also included are the applications of existing methods such as Bayesian networks to old-fashioned bulk data, with a possibility of addressing causality (e.g., which genes are true drivers and which genes are passengers that lie in the downstream of a casual path);
4) Integrative analysis based on multiple omics data, especially the methods using deep learning.
Please note, in addition to microarray, RNA-seq, and methylation data, the data types under consideration for this research topic span classic clinical data (such as data collected from epidemiological studies, surveys, or laboratory data) or experimental data (e.g., PCR ).
With the coming of the single-cell RNA sequencing era, many bulk high-throughout technologies such as microarray and RNA-Seq have become old-fashion and conventional. Nevertheless, the majority of data in publicly-accessible databases (e.g., GEO, ArrayExpress, and TCGA) are still bulk type, and tremendous invaluable biological information remains hidden beneath them. Advanced machine learning methods such as deep neural networks facilitate the exploration and pattern discovery of single-cell RNA-Seq data. Conversely, these methods are under-utilized or rarely developed for bulk data, possibly due to the fact that single-cell RNA-Seq data are regarded as “big-big data”, referring to that both the numbers of cells and genes are large, and therefore more suitable for such methods. Nevertheless, with additional considerations for the issues of small sample size and biological interpretation, e.g., feature selection or data augmentation, the adoption of deep learning methods to conventional omics data is certainly promising.
To address the underutilization of advanced artificial intelligence methods in classic bulk data, we propose this specific research topic, which focuses to boost the development and application of relevant novel machine learning methods to analyze conventional omics data, including microarray, DNA Bisulfite-Seq, RNA-Seq, Chi-Seq data and so on for pattern discovery and information mining. Here, the machine learning methods include, but not limited to convolution neural networks, recurrent neural networks, Bert-Technology, graph methods, and Bayesian networks. We believe that developing novel machine learning methods or adopting existing algorithms, especially deep learning methods to analyze the bulk omics data can help to dig out insightful biological information embedded by them and make them shine again.
Themes include:
1) The development and application of deep learning methods, especially for longitudinal bulk data;
2) Development or reutilization of advanced methods (such as deep graph network) to explore the interactions between features and to carry out feature selection, thus resulting in better biological interpretation;
3) Specific methods potentially make causal inference possible and their application, and also included are the applications of existing methods such as Bayesian networks to old-fashioned bulk data, with a possibility of addressing causality (e.g., which genes are true drivers and which genes are passengers that lie in the downstream of a casual path);
4) Integrative analysis based on multiple omics data, especially the methods using deep learning.
Please note, in addition to microarray, RNA-seq, and methylation data, the data types under consideration for this research topic span classic clinical data (such as data collected from epidemiological studies, surveys, or laboratory data) or experimental data (e.g., PCR ).