Next-generation sequencing (NGS) has revolutionized biomedical research, enabling genome-wide screening of genetic defects. The NGS based tests have many applications in Non-Invasive Prenatal Testing (NIPT), early detection of diseases, targeted therapy of various cancers and etiology of rare diseases. As genomic data increases, it will be a challenge to identify genetic patterns with traditional sampling-based statistical methods. Therefore, advanced machine learning methods, such as deep learning, and Artificial Intelligence (AI), can be very beneficial. As an end-to-end method, the deep neural network can extract complex feature patterns automatically and construct predictive modeling with little manual feature engineering.
Another change big data has caused is the comeback of instance-based or data-driven methods. Unlike the model-based learning or principle-driven methods, instance-based learning, such as K-nearest neighbors, is easy-to-use, easy-to-interpret and has high accuracy when the sample size is big enough to guarantee its performance and the system is too complex to build principle-driven models. With clinical NGS big data, the genetic causes of various hereditary diseases can be revealed and the shared genetic relationships between diseases can be investigated. Some very different diseases may share similar genetic causes and should be treated with similar approaches. Some similar diseases may have different genetic causes and should be treated accordingly.
The interpretable model with a simple rule is what we need the most to transform information exacted from big data into knowledge we can master and apply in medical practice. A black box AI algorithm cannot appease a worried patient. Therefore, the interpretable model is not only good for genetic counseling but also essential for knowledge validation and formation. It can also be used to check the accuracy of models and avoid misleading information caused by the bias of big data. The last but not the least change is that in clinical practice, the analysis methods for NGS panel data is quite different from the analysis methods for WGS/WES data which are widely used in the research community. For instance, CNV (Copy Number Variations) with paired WGS or WES data, it is easy to see the CNV peak, however, it is difficult to determine the CNV region, i.e. the start position and the end position. Furthermore, clinical panel data which only sequence several genes in tumor tissues, there are no other regions for comparison.
In order to do this, one needs to determine the baseline and more complex methods to infer the CNV status, only based on the sequencing data within a small region in tumor tissues. Most scientists have not faced such challenges and are not aware of such problems. For the clinical panel, most NGS analysis methods and tools are required to be re-invented. This Research Topic will focus on the challenges of clinical big data analysis in complex genetic diseases, by introducing the latest interpretable machine learning algorithms.
In the first volume, we gathered insights on the difference on the multi-omics scale between lung adenocarcinoma (LUAD) and squamous cell lung carcinoma (SCLC), the underlying molecular perturbations and their phenotypic impact in patients with the broad spectrum of intellectual disability (ID), the miRNA expression profiles and clinical data of esophageal carcinoma (EC) patients, the environment of Glioblastoma (GBM) tumor revealed by single-cell sequencing, the methylation and gene expression patterns of atrial fibrillation, the latent disease-lncRNA association prediction (FRMCLDA), the Molecular Prognostic Indicators in Cirrhosis (MPIC) database, the probability matrix factorization (PMFMDA) for discovering potential disease-related miRNAs.
With this volume II Research Topic, we aim to build on the progress demonstrated in the first volume. We hope to gather application of novel interpretable classification algorithms in clinical medicine, multi-omics big data integration analysis for genetic diseases, disease gene identification based on network analysis, eQTL associations between SNPs and genes, optimization theory based on targeted therapy for cancer, development of new NGS based tests for genetic diseases, heterogeneous network construction of disease, genes, proteins, and drugs.
Next-generation sequencing (NGS) has revolutionized biomedical research, enabling genome-wide screening of genetic defects. The NGS based tests have many applications in Non-Invasive Prenatal Testing (NIPT), early detection of diseases, targeted therapy of various cancers and etiology of rare diseases. As genomic data increases, it will be a challenge to identify genetic patterns with traditional sampling-based statistical methods. Therefore, advanced machine learning methods, such as deep learning, and Artificial Intelligence (AI), can be very beneficial. As an end-to-end method, the deep neural network can extract complex feature patterns automatically and construct predictive modeling with little manual feature engineering.
Another change big data has caused is the comeback of instance-based or data-driven methods. Unlike the model-based learning or principle-driven methods, instance-based learning, such as K-nearest neighbors, is easy-to-use, easy-to-interpret and has high accuracy when the sample size is big enough to guarantee its performance and the system is too complex to build principle-driven models. With clinical NGS big data, the genetic causes of various hereditary diseases can be revealed and the shared genetic relationships between diseases can be investigated. Some very different diseases may share similar genetic causes and should be treated with similar approaches. Some similar diseases may have different genetic causes and should be treated accordingly.
The interpretable model with a simple rule is what we need the most to transform information exacted from big data into knowledge we can master and apply in medical practice. A black box AI algorithm cannot appease a worried patient. Therefore, the interpretable model is not only good for genetic counseling but also essential for knowledge validation and formation. It can also be used to check the accuracy of models and avoid misleading information caused by the bias of big data. The last but not the least change is that in clinical practice, the analysis methods for NGS panel data is quite different from the analysis methods for WGS/WES data which are widely used in the research community. For instance, CNV (Copy Number Variations) with paired WGS or WES data, it is easy to see the CNV peak, however, it is difficult to determine the CNV region, i.e. the start position and the end position. Furthermore, clinical panel data which only sequence several genes in tumor tissues, there are no other regions for comparison.
In order to do this, one needs to determine the baseline and more complex methods to infer the CNV status, only based on the sequencing data within a small region in tumor tissues. Most scientists have not faced such challenges and are not aware of such problems. For the clinical panel, most NGS analysis methods and tools are required to be re-invented. This Research Topic will focus on the challenges of clinical big data analysis in complex genetic diseases, by introducing the latest interpretable machine learning algorithms.
In the first volume, we gathered insights on the difference on the multi-omics scale between lung adenocarcinoma (LUAD) and squamous cell lung carcinoma (SCLC), the underlying molecular perturbations and their phenotypic impact in patients with the broad spectrum of intellectual disability (ID), the miRNA expression profiles and clinical data of esophageal carcinoma (EC) patients, the environment of Glioblastoma (GBM) tumor revealed by single-cell sequencing, the methylation and gene expression patterns of atrial fibrillation, the latent disease-lncRNA association prediction (FRMCLDA), the Molecular Prognostic Indicators in Cirrhosis (MPIC) database, the probability matrix factorization (PMFMDA) for discovering potential disease-related miRNAs.
With this volume II Research Topic, we aim to build on the progress demonstrated in the first volume. We hope to gather application of novel interpretable classification algorithms in clinical medicine, multi-omics big data integration analysis for genetic diseases, disease gene identification based on network analysis, eQTL associations between SNPs and genes, optimization theory based on targeted therapy for cancer, development of new NGS based tests for genetic diseases, heterogeneous network construction of disease, genes, proteins, and drugs.