About this Research Topic
Big Data is often regarded as a massive volume of information stored in multifaceted and varied structures. In this modern genomic era, the volume of biological data has increased exponentially from tissue-level measurements of single genes to single-cell measurements of the entire genome or microbiome. Further, the structural complexity of biological data ranges from simple strings (nucleotides and amino acids sequences) to complex graphs (biochemical networks). AI has become popular in bioinformatics research, integrating biological knowledge with computational techniques to extract relevant biological features from the thousands measured. The exposure of AI in biomedical research provides unprecedented opportunities to enhance the outcomes of the patient and clinical team, reduce treatment costs, and positively impact overall population health. Clinical researchers employ AI techniques to effectively interpret massively complex biomedical datasets (e.g., multi-omics datasets).
Different types of AI algorithms are available to help researchers to classify and mine databases. Three branches of AI techniques are symbolic Machine Learning (ML), Neural Networks (NN) / Deep Learning (DL), and genetic algorithms (GAs). ML methods are usually classified into three categories, Supervised Learning (SL), Unsupervised Learning (UL), and Reinforcement Learning (RL). ML is the form of AI that enables machines to learn to make decisions from data with minimal or no human input and provide algorithms that govern the learning process. Concerning the other two branches of AI techniques, DL is often employed to classify diseases while GAs are extensively used for optimization and search purposes—both techniques operating on large biomedical data with excellent results.
Cells, the fundamental units of life and the building blocks for various helpful and harmful tissues, are characterized by different genomic factors that surely affect their identity and function. Given a population of numerous heterogeneous cells, single-cell RNA-sequencing is helpful to screen the levels of gene expression in every cell individually as contrasted to measure cell-population-level-based average expression (i.e., bulk RNA-sequencing). While most AI techniques are too “greedy” for standard bulk sequencing data problems (wherein the number of features measured far exceeds the number of samples), scRNA-seq data provides sample sizes generous enough to take full advantage of these powerful and flexible AI tools. As single-cell sequencing approaches become less expensive and data become more readily available, researchers will need tomorrow’s generation of AI techniques designed for scRNA-seq data developed today.
In order to showcase the features of AI and their application in the field of computational genomics, the scope of this issue broadly covers (but is not limited to) the following aspects:
• AI algorithms [viz., DL, ML, GA, Ensemble learning (EL)] in scRNA-seq data for multi-purpose objectives such as signature detection, cell clustering, rare cell detection.
• DL, ML, GA, Ensemble learning (EL) algorithms for the prediction of phenotypic traits and diseases from various omics profiles.
• Classification model (e.g., SVM model) for physiological data segmentation and analysis, disease progression prediction, and diagnosis.
• Pattern identification and tracking disease transmission through ML techniques.
• Role of AI in precision medicine, such as combining ML, DL with human pathologists to improve the success rate of diagnosis.
• Multi-omics data integration using ML, DL, and EL strategies (e.g., non-matrix factorization, multiple kernel fusion, etc.).
• Translating the cancer genomics into precision medicine through using AI methods.
• Identification of circRNAs, lncRNAs, and piRNAs using ML, DL methods.
• Prediction of the sequence specificities of DNA- and RNA-binding proteins, enhancers, and other regulatory regions by ML, DL, EL.
• Genetic or epigenetic marker discovery, module detection, hub gene finding, differential expression/methylation analysis using feature selection, ML, DL, GA, EL.
• Multi-objective optimization, regression-based GA, ML algorithms on multi-omics scRNA-seq data.
• Performing AI analysis on the patterns within the subset of a population that present similar clinical phenotypes of complex and deadly diseases.
Keywords: machine learning, deep learning, regulatory network, miRNA, transcription factor, scRNA sequencing, single cell, multi omics, multiple kernel fusion, graph theory, disease subtyping, big data, artificial intelligence, genetic algorithms, ensemble learning
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.