Innovative in Silico Approaches for Characterization of Genes and Proteins

Bhat, Gh. Rasool; Sethi, Itty; Rah, Bilal; Kumar, Rakesh; Afroze, Dil

doi:10.3389/fgene.2022.865182

REVIEW article

Front. Genet., 18 May 2022

Sec. Computational Genomics

Volume 13 - 2022 | https://doi.org/10.3389/fgene.2022.865182

This article is part of the Research TopicIntegrated Systems Genomic Approaches for Characterizing Uncharacterized ProteinsView all 6 articles

Innovative in Silico Approaches for Characterization of Genes and Proteins

Dil Afroze¹*

¹Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
²Institute of Human Genetics, University of Jammu, Jammu, India
³School of Biotechnology, Shri Mata Vaishno Devi University, Katra, India

Bioinformatics is an amalgamation of biology, mathematics and computer science. It is a science which gathers the information from biology in terms of molecules and applies the informatic techniques to the gathered information for understanding and organizing the data in a useful manner. With the help of bioinformatics, the experimental data generated is stored in several databases available online like nucleotide database, protein databases, GENBANK and others. The data stored in these databases is used as reference for experimental evaluation and validation. Till now several online tools have been developed to analyze the genomic, transcriptomic, proteomics, epigenomics and metabolomics data. Some of them include Human Splicing Finder (HSF), Exonic Splicing Enhancer Mutation taster, and others. A number of SNPs are observed in the non-coding, intronic regions and play a role in the regulation of genes, which may or may not directly impose an effect on the protein expression. Many mutations are thought to influence the splicing mechanism by affecting the existing splice sites or creating a new sites. To predict the effect of mutation (SNP) on splicing mechanism/signal, HSF was developed. Thus, the tool is helpful in predicting the effect of mutations on splicing signals and can provide data even for better understanding of the intronic mutations that can be further validated experimentally. Additionally, rapid advancement in proteomics have steered researchers to organize the study of protein structure, function, relationships, and dynamics in space and time. Thus the effective integration of all of these technological interventions will eventually lead to steering up of next-generation systems biology, which will provide valuable biological insights in the field of research, diagnostic, therapeutic and development of personalized medicine.

Introduction

The emergence of “innovative biology” is accompanied by the birth/innovation of other sciences, such as computational biology and bioinformatics, which have a combined interface of molecular biology. Due to the large datasets generated, its management and storage become critically important. Therefore, different databases came into existence, which organise a large amount of biological information stored and processed to permit the scientific community access (Ritchie et al., 2015). The increasing amount of data has been abetted by an increase in the number of biological databases (Pevsner, 2015). Usually public databases accumulate big amounts of information, and they are categorised into primary and secondary databases. The primary databases are composed of the findings of experimental data that are reported without any critical analysis related to previous publications (Luscombe et al., 2001; Prosdocimi, 2010). However, in the secondary databases, there is a collection and explication of data, called process of content curation. Besides various functional databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome that allow analysis and explanation of metabolic maps. Various primary databases like DNA Database of Japan (DDBJ), GenBank at the National Center for Biotechnology Information (NCBI), and European Molecular Biology Laboratory (EMBL) remained as the main databases of nucleotide sequences and proteins. International Nucleotide Sequence Database Collaboration (INSDC) being the parent organisation of these databases and sharing among each other the deposited information daily (Prosdocimi et al., 2002; Amaral et al., 2007; Pevsner, 2015).

Last 2 decades have witnessed great advancements in molecular biology, data analysis procedures were established at a fast pace to enable the interpretation of the large amount of information produced mainly by DNA sequencing technologies that produced the exponential amelioration of genomics, transcriptomics and proteomics information. Biological data of genomics/proteomics although considered to be the recent domains, have emerged interdependently and created a historical impact on the available information coupled with innovations in computational resources, resulted in huge biological data and data analysis that can enhance and intensify the developments in medical science (Verli, 2014). In the current modern times ‘-omics’ suffix include the genomics, transcriptomics, proteomics, phylogenomics, metabolomics and metagenomics, associated with large-scale biological data and the allied bioinformatics analysis. The emergence of newest high-throughput sequencing innovations, starting with improvements in Sanger sequencing, innovations in NGS technologies and next-generation proteomics, resulted in emergence of novel findings in the clinical settings (Zhou et al., 2010).

Genome-Wide Approach—From Genome to Proteome

DNA sequencing plays a crucial role in the progression of molecular biology, not only changing the genetic landscape of genome designs but also opening up new opportunities in therapeutic arena and personalised medicine

Genomics

Generally, Genomics is the domain that aims to uncover and explore structure, function, and innovative realm of genomes applying bioinformatics tools to explore sequenced genomes. (Altmann et al., 2012).

Paul Berg’s (Jackson et al., 1972), Frederick Sanger’s (Sanger and Coulson, 1975), and Walter Gilbert’s (Maxam and Gilbert, 1977) pioneering work on DNA sequencing enabled several developments, including the advances that opened up completely new potentials for DNA analysis, Sanger’s ‘chain-termination’ sequencing technology, more commonly known as Sanger sequencing (Sanger et al., 1977). Further technological advancements steered in the rise of DNA sequencing, led to the development of the first automated DNA sequencer (ABI PRISM AB370A) to be released in 1986, allowing drafting of the human genome to be completed during the next decade (Venter et al., 2001). These new methods are meant to supplement and eventually replace Sanger sequencing Figure 1. This technology is commonly known as next-generation sequencing (NGS) or massively parallel sequencing (MPS), which encompasses a wide range of methodologies. It is feasible to create huge amounts of data & each instrument runs in a faster and more cost-effective manner using this technology. The Next Generation Sequencing market is currently developing and expanding, with the world-wide market expected to reach 21.62 billion US dollars by 2025, up around 20% from 2017 (BCC Research, 2019). As a result, multiple brands are currently competing in this business, including BGI Genomics, Illumina, Ion Torrent (Thermo Fisher Scientific), PacBio and Oxford Nanopore Technologies etc. All of them provide distinct approaches to the same query: the generation of sequencing data. Second-generation sequencing relies on large parallel and clonal amplification of molecules (PCR, polymerase chain reaction) (Shendure and Ji, 2008), whereas third-generation sequencing depends on sequencing of single-molecules without a preceding clonal amplification (Schadt et al., 2010; van Dijk et al., 2018; Ameur et al., 2019). Although the process of NGS include various steps:

FIGURE 1

FIGURE 1. specifies the timeline of DNA sequencing. Some of the most significant and ground-breaking developments in DNA sequencing. NG stands for next generation, and PCR is for polymerase chain reaction. SMS stands for single molecule sequencing, and SeqLL stands for sequence the lower limit.

1) NGS library Preparation: A library comprises DNA/RNA fragments that denotes the full genome/transcriptome or a region of interest in next-generation sequencing. Each NGS platform has its own unique features, in general, the production of an NGS library begins with fragmentation of the DNA/RNA, followed by the connection of sequence adaptors to fragments to permit enrichment of those fragments. The sensitivity and specificity of a good library should be high. This implies that all relevant fragments should be properly represented in the library and that there should be no random errors (non-specific products). It is easier said than done, though, because genomic areas are not all equally susceptible to sequencing, making the creation of a sensitive and specialised library difficult and cumbersome (Aird et al., 2011).

2) NGS Platforms

Platforms for Second-Generation Sequencing

The category of cyclic-array sequencing technologies (Amaral et al., 2007) includes second-generation systems. The production and library amplification (made from RNA/DNA samples), clonal growth, sequencing, and investigation are all part of the core workflow for second-generation platforms. Ion Torrent and Illumina are the two most well-known sequencing firms for second-generation sequencing systems (Kircher et al., 2011; Quail et al., 2012).

3) Platforms for Third-Generation Sequencing:

The ability to avoid limitations of PCR-based methods, such as nucleotide misincorporation by a polymerase, formation of chimaera and drop-outs of alleles resulting in an false homozygosity call, was made possible by 3^rd-generation NGS technology (Thompson and Steinmann, 2010). The Helicos Genetic Analysis System was the first commercial third-generation sequencer (Pushkarev et al., 2009). The Pacific Biosystems (PacBio RS II sequencer) established the notion of single-molecule real-time (SMRT) sequencing in 2011 (McCarthy, 2010). Furthermore, this method allows for the sequencing of lengthy reads (up to 30 kb on average). Individual DNA polymerases are coupled to zero-mode waveguide (ZMW) wells, which are nanoholes where a single DNA polymerase enzyme molecule can be put directly (McCarthy, 2010). PacBio has released the Sequel II System, which claims to cut project costs and timelines by up to 175 kb with highly accurate individual long reads (HiFi reads) compared to previous versions (Pereira et al., 2020).

Merker and co-workers demonstrated initially to use a PacBio System for sequencing of long-read genomes to find a pathogenic variant in Mendelian disease patients, indicating that this method has a lot of potential for identifying structural variation (Merker et al., 2018). The Chromium instrument, which uses gel beads in emulsion (GEMs) technology, was released by 10X Genomics in 2016 (Pereira et al., 2020). The benefit of GEMs technology is that it cuts down on time, beginning material, and prices (Zheng et al., 2016; Zheng et al., 2017; Pereira et al., 2020). With low false positives and high throughput, the chromium system can also perform single-cell genomic and transcriptional profiling, immunological profiling, and chromatin accessibility studies at single-cell resolution. As a result, intriguing new applications are emerging, particularly in the areas of epigenetics research, de novo genome assembly, and long sequencing reads (Delaneau et al., 2019; Laurentino et al., 2019; Wang et al., 2019).

4) Innovative Bioinformatics approach: Sequencing platforms are improving, and it is now possible to sequence the human genome in as little as a week or two. Thus, the huge data generated necessitates bioinformatics and computational expertise to organise, analyse, and infer NGS data. As a result, NGS bioinformatics is undergoing significant development, which can only be aided by improving computational capabilities (hardware) as well as algorithms and applications (software) to streamline all required steps: from processing of raw data to detailed data analysis and variant interpretation in a clinical setting.

Analysis of the NGS data: NGS bioinformatics is usually classified into three categories: primary, secondary, and tertiary analysis (Pereira et al., 2020).

The primary data analysis includes the identification and evaluation of raw data (signal analysis), the target of the generation of legible sequencing reads (base calling), and the estimation of base quality (Ledergerber and Dessimoz, 2011). This main analysis often produces a FASTQ file (Illumina) or an unmapped binary alignment map (uBAM) file (Ion Torrent).

Secondary analysis, which involves read alignment against the reference human genome (usually hg19 or hg38) and variant calling, is the next step in the NGS data analysis workflow.

Read alignment, which includes aligning sequenced fragments (processed data) against a reference genome, or de-novo assembly, which involves constructing a genome from basic without the use of external data, are two options for mapping sequencing reads. The availability or absence of a reference genome could be enough to decide between one technique and another. Nonetheless, reference sequence mapping is the preferred method for most NGS applications, particularly in clinical genetics (Flicek and Birney, 2009). However, de-novo assembly, on the other hand, is primarily limited to more focused tasks, such as correcting flaws in the reference genome and improving the detection of SV and other complicated rearrangements and newer findings (Ameur et al., 2018).

In the context of human clinical genetics, the third main phase of the NGS analysis pipeline addresses the essential issue of “making sense” or data interpretation, which requires finding the basic link between variant data and the observed phenotype in a patient. The tertiary analysis starts with variant annotation, which adds a fresh layer of data to predict the functional impact of all variants found during the variant calling procedure. Variant filtering, prioritisation, and data visualisation approaches are utilised after variant annotation. These procedures can be carried out utilising a number of software suites, which must be updated on a regular basis to reflect the most recent scientific findings, necessitating ongoing maintenance and development on the part of the developers. The generalised workflow of NGS is shown in Figure 2.

FIGURE 2

FIGURE 2. illustrates the various steps like Raw Data Quality Control, Alignment, Post Alignment Processing, Variant Filtration, Annotation and Reporting of variants involved in bioinformatics workflow for next-generation sequencing (NGS).

Variant annotation is a crucial first step in the assessment of sequencing variants. As previously indicated (Scherer et al., 2007), variant calling generates a VCF file. Each line in such a file contains high-level information about a variant, such as genomic position, reference, and alternate bases, but no information biological implications. Variant annotation provides biological context for all discovered variants. Data annotation is performed automatically due to the large amount of NGS data. For variant annotation, several programmes are currently available, each of which uses distinct approaches and databases such as Sorting Intolerant from Tolerant (SIFT), (Ng and Henikoff, 2003), PolyPhen-2, (Adzhubei et al., 2010), Combined Annotation Dependent Depletion (CADD) (Kircher et al., 2014) and Condel (González-Pérez and López-Bigas, 2011), compute the impact scores for each variant based on various specifications, such as sequence homology, conservation of amino acid residues, evolutionary conservation, structure of protein, or statistical prediction based on known mutations, are integrated into such annotation tools. Furthermore, annotation can be used to search disease variant databases like ClinVar and HGMD for information on their clinical associations. Annotate Variation (ANNOVAR) (Yang and Wang, 2015) variant effect predictor (VEP) (McLaren et al., 2010), Single Nucleotide polymorphism effect (snpEff) (Cingolani et al., 2012), and SeattleSeq (Ng et al., 2009) are the most extensively used annotation tools among the many available. SNPs, INDELs, and Copy Number Variation (CNVs) can all be found using ANNOVAR, a command-line tool. It compares variants and explicates the functional consequence of variants on genes and other genomic components (Wang et al., 2010a). The overall number of variants obtained after analysis of a VCF file from WES may range between 30,000 and 50,000. Filtering algorithms are required to find the variant(s) responsible for a particular disorder. Some more examples in Table 1. As a result, it is strongly advised to eliminate false-positive calls and variant call errors when beginning the third level of NGS analysis, depending on quality parameters or prior knowledge of artefacts. The population frequency filter is one of the most widely used NGS filters. One of the filter based on allele frequency is minor allele frequency (MAF), which can sort variations into different categories: uncommon variants (MAF 0.5, usually picked for Mendelian illnesses), low frequency variants (minor allele frequency between 0.5 and 5%), and common variants (MAF >5%) (Consortium et al., 2010). It not only aids in better identifying disease alleles, but also in understanding population migrations, relationships, origins, admixtures, and population size changes, which may be useful in understanding various disease patterns (Stoneking and Krause, 2011). The most extensively utilised databases are the 1,000 genome project (Siva, 2008), Exome Aggregation Consortium (ExAC) (Lek et al., 2016), and the Genome Aggregation Database (gnomAD; http://gnomad.broadinstitute.org/). This filter, however, has limits and may result in incorrect exclusion.

TABLE 1

TABLE 1. Demonstrates a list of commonly used tools for performing an NGS functional filter, along with examples.

Even though, functional annotation offers a significant information for filtering, the most critical question to answer, especially in the context of gene discovery, is whether a given variant or mutant gene the disease-causing gene? What is its frequency in different population sets studied globally? To solve this difficult issue, a new generation of tools is being created that, rather than just omitting information, rate variants and allow them to be prioritised. (MacArthur et al., 2012; Lelieveld et al., 2016; Harper, 2017). Various ways have been suggested e.g. PHIVE investigates the similarities between human illness phenotypes and those derived from animal model organism knockout experiments (Robinson et al., 2014). While other methods try to handle the problem in a novel way, by computing a lethal score (also known as burden score) for each gene using data from population variation databases (Eilbeck et al., 2017).

Phevor, which uses data from other relevant ontologies, such as gene ontology (GO), to advocate novel gene–disease connections, can also be employed for the identification of novel genes (Singleton et al., 2014). The fundamental purpose of these tools is to provide a small number of variants that can be validated using molecular techniques (Pereira et al., 2019a; Pereira et al., 2019b). VarSeq/VSClinical (Golden Helix), Ingenuity Variant Analysis (Qiagen), Alamut^® software (interactive biosoftware), and VarElect have all recently been developed commercial softwares for the elucidation and prioritisation of variants in a clinical context, to be used by clinicians, geneticists, and researchers (Stelzer et al., 2016). Apart from the tools that aid in variant analysis and elucidation, clinicians now have access to medical genetics firms like Invitae (https://www.invitae.com/en/) and CENTOGENE (https://www.centogene.com/) that provide a precise medical diagnosis.

5) Third generation sequencing technologies has the capability of sequencing single molecules with average read lengths of >10,000bp -100,000bp or even more. The advent of this technology has eliminated the requirement of amplification of DNA (PCR) and it provides real time results (Pereira et al., 2020). The third-generation sequencing services are provided by Pacific Biosciences (PacBio) that utilizes the single molecule real time (SMRT) platform and fluorescent nucleotide detection methodology. Oxford Nanopore Technologies (Minion) which utilizes the nanopore methodology where an ionic current passes through the flow cell and nucleotides bases are determined by the changes they produce in the current respectively when pass through the nanopores. (Xiao and Zhou, 2020).

The bioinformatic tools required to analyze the data obtained from the third-generation sequencing technologies needs to be more specific and error prone. Some tools are depicted in Table 2.

TABLE 2

TABLE 2. Demonstrates various software used in third generation sequencing.

Limitations: Although Third generation sequencing technology is fast and provide real time result however still NGS are preferred as the error rate is less in NGS as compared to third generation sequencing which is ∼15%. Due to this high error rate, the technology can miss the detection of SNPs/point mutations and not best suited for mutational analyses. The methodology requires improvement. Moreover, there is need to develop more bioinformatic tools and algorithms for the downstream data analyses that is again a challenge for researchers for the time being (Ozsolak, 2012).

Transcriptomics

cDNA sequencing or RNA-seq when compared to other methods allows for more accurate mapping of reads and quantification at the transcript level. Differential expression analysis and identification of isoforms due to mRNA splicing, NGS of Small non-coding RNA as well as the discovery and characterisation of novel transcripts, are examples of high throughput applications (Marioni et al., 2008; Wang, 2009; Montgomery et al., 2010).

Small non-coding RNA NGS: A significant increase has been seen in the research community related to biomarkers which aids in the prediction, early detection and prevention of the disease. The biomarkers research helps the scientific and clinical community significantly in improving the clinical outcomes (Lopez et al., 2015). Non-coding RNAs (ncRNAs) have become the biomarker hotspot of the research interest in the field of disease identification and treatment. MicroRNAs (miRNAs) are the type of ncRNAs which are mostly explored for their potential biomarker role (Lopez et al., 2015). Till date ncRNA studies have been performed mainly by qRT-PCR, in situ hybridization, or microarray techniques. NGS has opened a new way to analyze/detect the RNA molecules present in the biological samples. NGS tenders several methodological advantages over other technologies like increased throughput, decreased RNA input, good consistency and quality of data, higher detection depth, analysis of all RNA populations, and discovery of novel molecules (Liu et al., 2021). A typical RNA-sequencing experiment consists of the following steps:

Thus all the above possibilities have allowed us to learn more about the genome’s organisation, the molecular constituents of cells and tissues, and the complexities of regulatory systems (Zhou et al., 2010; Sims et al., 2014). Many investigations, both fundamental and applied, have focused on mRNA splicing. Between the transcriptional and translational level, splicing occurs in every eukaryotic cell. Pre-mRNA transcripts may be variably spliced depending on location of tissue and/or stage of development, allowing multiple transcripts to be generated and hence distinct proteins to be made from the same gene (Burge et al., 1999; Nilsen, 2003). The divergence of splice site sequences from the prototypes has been linked to the generation of alternative transcripts. Furthermore, in most introns of higher eukaryotes, these extremely degraded motifs may be observed. Pseudo-exons are intronic sequences of standard exon size that outnumber real exons and are flanked by sequences that fit the exon’s 5′ and 3′ splicing signal requirements, but are never recognized as proper exons by the spliceosome. To distinguish true exons and splice sites from pseudo exons, splicing machinery must rely on auxiliary sequence features such as intronic and exonic cis-elements (Jacob and Gallinaro, 1989).

Exonic Splicing Enhancers (ESEs) are the most researched and well explored among them. They’re nucleotide sequences of short length that are primarily targeted by Serine/Argine-rich (SR) proteins, which then help to define exons (Blencowe, 2000). Exonic Splicing Silencers (ESSs), on the other hand, assist the spliceosome in neglecting pseudo exons and decoy splice sites. They serve as binding sites for exon exclusion-promoting proteins (mostly hnRNP proteins) (Zhu et al., 2001). Several bioinformatics approaches have been created and are now accessible to examine or predict splice signals (Zhang et al., 2005). One of the most essential bioinformatics tools is HSF (Human Splice Finder). For administration of data, designing of algorithm and online interface, HSF was built with the 4D package (4D S.A.). The HSF database was created with all human genes containing introns and exons. It was created using an Ensembl dataset that included about 22 000 genes and 46 000 transcripts from Homo sapiens. Because matrices and methods were specifically built for the human genome, the HSF database exclusively contains human genes (Flicek et al., 2008). HSF also has data taken from the Ensembl Variation Database (EVD), which can be used to investigate the impact of SNPs on splicing. A Perl script was written utilizing the Ensembl Perl API to allow HSF to access the EVD directly and get SNPs in human genes. Because matrices and methods were specifically built for the human genome, the HSF database exclusively contains human genes (Flicek et al., 2008).

On the other hand, Exonic splicing enhancers (ESEs) can be disrupted by nonsense, missense, and even translationally silent mutations, causing the splicing machinery to skip the mutant exon with significant consequences on gene structure. The frequency of mutations, whose major consequence is unusual splicing has been significantly underestimated because the effects of mutations are most often predicted purely based on information of genomic sequence (Cartegni et al., 2002). ESEs are found in both alternative and constitutive exons, where they serve as binding sites for Ser/Arg-rich proteins (SR proteins), a family of conserved splicing factors involved in a variety of splicing stages (Graveley, 2000). Through their RNA-binding domain, SR proteins promote exon definition by attracting spliceosomal components via protein–protein interactions facilitated by their RS domain and/or antagonizing the function of surrounding splicing silencers. Multiple categories of ESE consensus motifs have been described, and different SR proteins have varying substrate specificities (Graveley, 2000; Cartegni et al., 2002; Fairbrother et al., 2002). Using weight matrices for four different human SR proteins, ESE finder searches query sequences for potential ESEs. The matrices are based on frequency values produced from the alignment of winning sequences obtained through functional SELEX studies, corrected for the background nucleotide frequency of the initial SELEX library, which was created using chemical synthesis (Liu et al., 1998; Liu et al., 2000). The query sequences can be entered directly into the input box or submitted as a text file. Multiple sequences can be processed at the same time if they are preceded by a FASTA-format description line (starting with ‘>’). Despite the fact that ESEfinder is a tool for RNA analysis, it only accepts normal DNA nomenclature (A, C, G, and T, not U). Any character other than the letters A, C, G, and T, as well as spaces and paragraph breaks, will be ignored by the programme. Although both upper and lower case are acceptable, the output lines will be written in upper case. The user can choose from one to four matrices to be used at the same time. The result for each matrix is a series of 1 ntd incremented scores. Only the ‘hits’ or ‘high score motifs’ are displayed in the initial output window, Figure 3 which include the position of the first nucleotide, the motif match sequence, and the calculated score. When a score exceeds the threshold value set in the input page, it is deemed a high score.

FIGURE 3

FIGURE 3. Output window with complete list of scores. High scores are represented as color-coded bars. The height of each bar indicates the score value (motif score), and its width and placement on the x-axis represent the length of the motif (6–8 nt) and its position along the sequence.

By choosing the ‘custom’ button and entering the required value into the box, any score can be used as the cutoff threshold. As a result, ESEfinder may be used to identify potential ESEs, and the prime application is the accurate interpretation of the impact of disease-associated variants. It has been previously demonstrated that ESEs predicted by this matrix-based method cluster in places where natural enhancers have been empirically localized and are more common in exons than in introns (Cartegni et al., 2003).

CircRNAs: In contrast to messenger RNAs, circular RNAs (circRNAs) are physiologically active nucleic acid molecules that occur in closed loop RNA forms and do not have polyadenylated tails. CircRNAs are classified as non-coding RNA (ncRNA), yet some circRNAs have the ability to code for proteins. CircRNAs were originally discovered and identified in plant viroids in the 1970s, and then in the cytoplasm of eukaryotic cells in the 1980s. Due to the prevalence of linear RNAs, early development in this field was likely modest, and circRNAs were thought to be a consequence of RNA splicing. Recent advancements in next-generation sequencing and related bioinformatics technologies, on the other hand, have speed up research in humans, mice, nematodes, plants, and archaea have all been found to have these compounds (Chen et al., 2021). Various tools employed for the analysis of circRNAs are summarized in Table 3, below.

TABLE 3

TABLE 3. Showing the various bioinformatic software tools used in circRNAs analysis.

Proteomics

Understanding the molecular processes that mediate cellular physiology requires the identification, quantification, and characterization of a cell’s whole protein content (Schmidt et al., 2014; Jensen et al., 2006). A rapid advancement in proteomics has steered the researchers to organize the study of protein structure, function, relationships, and dynamics in space and time. The groundbreaking revelation that DNA contains all of the genetic instruction required to build an organism gave rise to molecular biology’s central dogma, which characterized a one-way flow of information from DNA to RNA to Proteins. This belief has been debunked by recent discoveries. Epigenetic markings, alternative splicing, non-coding RNAs (including microRNAs), protein–protein interaction (PPI) networks, and post-translational modifications (PTMs) are only a few examples of how genotype and phenotype are not solely determined by information on the genome (Nagaraj et al., 2011; Beck et al., 2011; Baker, 2012). Proteomics is the global study of proteins, which are the key functional entities in the cell. This analysis is arguably the most important level of information required to understand how cells work. When compared to data collection at the genomic and transcriptomic levels, the proteomic data acquisition has proven difficult. Global protein analysis is a difficult analytical task, in part because amino acids, the building blocks of proteins, have such a wide range of physicochemical properties. Furthermore, in comparison to the genome, the proteome is enriched by alternative splicing and a wide range of protein modifications and degradation, and the complexity is heightened by the interconnectivity of proteins into complexes and signaling networks that are highly divergent in time and space Figure 4 (Cox and Mann, 2011). A decade ago, sequencing and identifying a single protein was a big problem; however, today’s high-throughput technology allows for the identification and quantification of essentially all expressed proteins in a single experiment. Similarly, 10 years ago, MS-based phosphoproteomics could only identify a few hundred phosphosites, whereas currently more than 30,000 phosphosites can be quantitatively monitored. This current method is referred to as “next-generation proteomics” to reflect its ability to characterize practically the whole proteome as a result of advancement in technology. Proteomics technologies, particularly MS-based Protein identification has advanced tremendously in recent years as a result of cumulative technological breakthroughs in instrumentation, sample preparation and computational analysis (Ficarro et al., 2002; Lemeer and Heck, 2009; Lundby et al., 2012).

FIGURE 4

FIGURE 4. The diverse and dynamic methods of proteome regulation give the human genome a higher level of complexity. There are roughly 20,300 genes in the human genome. The molecular basis of the cellular phenotype (that is, the tissue cell types) is determined by the specific expression of a subset of the genome (11,000 genes). The sophisticated methods of protein regulation, such as splicing variations PTMs, post-translational modifications; PPIs, protein–protein interactions, and subcellular localization, acquire a considerably higher order of complexity. This results in tissue- and organelle-specific protein networks that respond to perturbations differently throughout time (for example, ageing or drug treatment).

Proteomics using mass spectrometry (MS) generates a large quantity of information about the expression, post-translational modifications (PTMs), and interactions among thousands of proteins. The obtained data must be supplied to the scientific community in a format that is both suitable and curated, as well as retrievable and interpretable. Proteomics data will be made freely available to the public, ensuring that quality standards are maintained in the area. The long-term storage of unprocessed raw data is a first level of distribution for proteomics data. Understanding the proteome’s complex and dynamic interactions necessitates the creation of physical interaction charts.

Proteins frequently interact with one another in stable or transient multi-protein complexes of varying composition, with the human interactome containing an estimated 130,000 binary interactions, the majority of which have yet to be mapped. Proteins can also interact with other molecules like RNA, DNA and metabolites. These complexes play crucial roles in regulatory processes, signalling cascades, cellular functions, and their inability to interact can result in their function being lost (Altelaar et al., 2012; Ma and Johnson, 2012). Tranche is one of the few public repositories that can manage this type of data at the moment, and it is based on an encrypted peer-to-peer system that stores data in numerous servers across the world. Raw data, on the other hand, is in a closed format, which makes it difficult to share. As a result, attempts are being undertaken to standardise formats that preserve all necessary information (Smith et al., 2011). The European Bioinformatics Institute’s PRIDE database exhibits this determination, as it enables the for the storage of both conventional MS data formats (XML) and associated peptide and protein identifications. Furthermore, including additional data (such as species, fragmentation procedures, and proteases) allows for a global meta-analysis of proteomic data sets (Perez-Riverol et al., 2019).

Moreover, Protein sequence alignment compares two or more than two sequences and aids in the identification of homologous regions, visualizing the relationship among sequences with respect to evolution and structure. It plays a crucial role in bioinformatics and helps in the query and construction of databases, prediction of protein’s primary, secondary and tertiary structure and biological function and many more. Many platforms are developed to analyse the sequence alignment. Some of them are PROSITE, Pfam, BLAST, FASTA, Clustal omega, T-Coffee, MUSCA, ALIGN, DIALIGN, ProbCons, HMMER3 phmmer and many more (Pruess and Apweiler, 2003; Sievers et al., 2011; Singh et al., 2016a).

Protein structure prediction can be done using the ProtParam tool from ExPasy (Expert Protein analysis system) (Gasteiger et al., 2005). It helps in the primary structure prediction of protein and aids in the computation of physicochemical properties of a given protein. The parameters that can be computed include molecular weight, amino acid and atomic composition, isoelectric point, estimated half-life, grand average of hydropathicity (GRAVY) and more. To predict the secondary structure, many tools have been developed till now including Chow-Fasman algorithim—a statistical approach which is based on calculation of statistical propensities of each residuum to form an α-helix or β-strand, GOR, Jpred, etc. Similarly, for tertiary protein structure prediction, PHYRE2 (Protein Homology/analogY Recognition Engine) (Kelley et al., 2015) and I-TASSER are available (Yang et al., 2015).

Apart from above mentioned software suits, there are other tools which are helpful in addressing protein analysis. Some of them are mentioned in Table 4.

TABLE 4

TABLE 4. Demonstrates the Protein sequence analysis tool.

To study the post-translational modifications, tools like GlycoMod (Cooper et al., 2001), NetPhos (Trost and Kusalik, 2011), NetPicoRNA (Smits et al., 2013), FindMod (Gasteiger et al., 2003), ScanProsite (De Castro et al., 2006) and others are available online. For protein interaction analyses STRING can be used (Szklarczyk et al., 2021). To visualize the 3-D structure of proteins, tools like Pymol and Jmol can be used. Pymol is also used to visualize the protein-ligand docking, binding site prediction, protein interactions and others (DeLano, 2002; Herráez, 2006).

The identification of protein biomarkers with prognostic or diagnostic significance is one of the most difficult applications of proteomics right now Figure 5.

FIGURE 5

FIGURE 5. Typical workflow for identifying, validating, and stratifying protein-based biomarker signatures. Proteomics based on mass spectrometry (MS) is utilized for in-depth quantitative characterization of a disease model’s proteome and its appropriate control mechanisms. Following the application of strict statistics, a list of candidate proteins that can be used as a phenotypic signature is defined. These markers are verified in large patient cohorts using more specific methodologies, such as MS-based (for example, selective reaction monitoring (SRM)) or antibody-based approaches. To confirm that the biomarker has a direct mechanistic involvement in the disease, the biological connections between the signature proteins and the disease phenotype should be biochemically confirmed.

As previously mentioned, recent technical advancements have resulted in the development of comprehensive pipelines that incorporate the discovery and validation phases, allowing plasma biomarkers to be identified for many diseases (Addona et al., 2011; Whiteaker et al., 2011). Despite the introduction of some successful biomarkers for clinical application, many (if not most) claimed biomarkers have weak reliability or lack rigorous confirmation, leading to scepticism among clinicians. The lack of proper controls in the discovery phase, the use of appropriate statistical tools for biomarker definition, and the need for independent validation steps in large patient cohorts to certify the legitimacy of the biomarker unambiguously are the primary flaws in many biomarker studies; such flaws lead to claimed biomarkers that are rarely directly related to disease biology (Poste, 2011).

Metabolomics: Beacon for the 21st Century

After genomics, transcriptomics and proteomics, metabolomics is the innovative & newest of the “omics” sciences, combining high-throughput analytical techniques with bioinformatics. It is concerned with the quantitative and qualitative evaluation of metabolites, which are key metabolic intermediates and end products (Zhang et al., 2014b). The purpose of this scientific method is not only to figure out what pathological processes or disturbances are at the root of a specific disease entity, but also to anticipate how those conditions will respond to treatment interventions. Metabolomic analysis help discriminate between normal and abnormal pathways, which aids in disease diagnosis and prognosis prediction (Zhang et al., 2015). The potential of the metabolome to reflect environmental effects and to provide a snapshot of the individual’s pathophysiological status at a certain point in time is a noteworthy benefit of the metabolome over the genome (Shah et al., 2015; Zhang et al., 2015). The prime concern of the researchers/clinicians is the better understanding of the disrupted biochemical and pathological processes, as well as to inform the creation of more effective therapeutic medicines for the treatment of those illness states in humans. Metabolomic tools have the benefits of being quick, inexpensive, and sensitive. Metabolomics can be studied using a variety of techniques, including mass spectrometry (MS), nuclear magnetic resonance (NMR) spectroscopy, and Fourier-transform infrared (FTIR) spectroscopy. Metabolomic fingerprinting, metabolic profiling, metabolic footprinting, target analysis, and flux analysis are examples of such methods that all play important roles in understanding toxicological mechanisms and disease processes in live organisms (Tripathi et al., 2013; Zhang et al., 2013; Zhang et al., 2014b). Metabolomics is also critical in discovering new drugs, biomarkers for early disease diagnosis, such as rheumatoid or osteoarthritis (Carlson et al., 2018; Takahashi et al., 2019; Dudka et al., 2021), osteoporosis, cardiovascular disease, and Alzheimer’s disease (AD), cancer prognosis, diagnosis, and treatment (Pushkarev et al., 2009; McCarthy, 2010; Thompson and Steinmann, 2010; Kircher et al., 2011; Quail et al., 2012; Zheng et al., 2016; Zheng et al., 2017; Merker et al., 2018; Pereira et al., 2020), inborn errors of metabolism (IEM) and a variety of other applications (Carlson et al., 2018).

Pharmacogenomics/Pharmacogenetics: in-Silico Approach

Pharmacogenomics is described as the study of genes and how medications alter an individual’s reaction. Pharmacogenomics is an emerging new discipline of science that combines pharmacology (the branch of science that studies drugs) with genomics (the branch of science that studies genes) to generate effective doses and safe pharmaceuticals tailored to an individual patient’s genetic makeup. One of the most important programs in which researchers are building and learning about genetic relationships and their impact on the body’s reaction to drugs is the Human Genome Project. Differences in genetic makeup influence pharmaceutical effectiveness, making it possible to anticipate medication effectiveness for an individual and investigate the presence of adverse drug reactions in the future (Caldwell et al., 2007).

Because of the wide range of individual responses to drug therapy, predicting the degree of effectiveness of a medication for a certain patient is difficult. Along with these clinical aspects, pharmacological factors such as variations in metabolism, drug distribution, and drug directed proteins play a significant role (Wattanachai et al., 2017). Table 5 describes various softwares employed in addressing Pharmacogenomics.

TABLE 5

TABLE 5. Demonstrates various in silico approaches used in Pharmacogenomics.

Epigenomics—complex diseases: An enigma

Understanding the causes and mechanisms of complex non-Mendelian diseases remains a major issue and point of concern, despite substantial effort. Despite the fact that various molecular genetic linkage and association studies have been carried out in order to explain the heritable tendency to complicated disorders, the results are sometimes inconclusive and even contentious. Similarly, determining the environmental factors that cause a disorder is difficult (Singh Nanda et al., 2016). The emphasis is switched to epigenetic misregulation as a primary etiopathogenic element, which presents a novel interpretation of the paradigm of “genes plus environment”.

Various non-Mendelian irregularities of complex diseases, such as the presence of clinically indistinguishable sporadic and familial cases, sexual dimorphism, relatively late age of onset and peaks of susceptibility to some diseases, discordance of monozygotic twins, and major fluctuations on the course of disease severity, are consistent with epigenetic mechanisms. It is also been claimed that stochastic epigenetic processes in the cell may account for a significant percentage of phenotypic diversity formerly attributed to environmental factors. It is proposed that using epigenetic strategies in conjunction with traditional genetic strategies can greatly speed up the finding of etiopathogenic processes in complicated disorders (Lacal and Ventura, 2018). Epigenetic microarray technologies and in silico approaches will considerably enhance epigenetic investigations in complicated disorders as shown in Table 6.

TABLE 6

TABLE 6. Showing various in silico approaches in Epigenomics.

Pathway/Enrichment Analysis framework: omics Data

Comprehensive DNA, RNA, and protein quantification in biological materials is prevalent. The generated data is rapidly accumulating, and its analysis aids researchers in discovering new biological functions, genotype–phenotype correlations, and disease causes (Lander, 2011; Stephens et al., 2015). Many researchers, however, find that analysing and interpreting these data is a huge issue. Long lists of genes often emerge from analyses, requiring an impractically enormous amount of manual literature research to analyze.

Scientists can use pathway enrichment analysis to acquire mechanistic insight into gene lists generated by genome-scale (omics) investigations. This approach finds biological pathways that are more enriched in a gene list than is expected by chance (Nguyen et al., 2019). Innovative pathway enrichment analysis methodologies and provide a step-by-step guidance for interpreting gene lists generated by RNA-seq and genome-sequencing research. The approaches can be employed in various sets: defining a gene list using omics data, determining statistically enriched pathways, and visualizing and interpreting the results. This technique can be used in expressed genes and cancer genes that have been altered; however, the idea can be extended to a wide range of omics data (Paczkowska et al., 2020). Although there are various enrichment tools. Few of them are summarized in Table 7.

TABLE 7

TABLE 7. Showing various enrichment tools.

Single-Cell Genomics “Cancer Research/Pan-Cancer Biomarkers”

Single-cell sequencing refers to the sequencing of a single-cell genome or transcriptome in order to gather genomic, transcriptomic, or other multi-omics information that can be used to show cell population distinctions and cell evolutionary linkages as in plethora of cancers. Traditional sequencing methods can only obtain an average of many cells, making it impossible to study a small number of cells and resulting in the loss of cellular heterogeneity data (Wen and Tang, 2018).

Single-cell methods have the advantages of detecting variability among individual cells [1, differentiating a small number of cells, and outlining cell maps when compared to classical sequencing technology (Pennisi, 2012).

Multimodal analysis with integration (Multimodal analysis), or the ability to assess various data types simultaneously from the same cell, is a new and exciting future for single-cell genomics. Weighted closest neighbor (WNN) analysis, an unsupervised technique for learning the information content of each modality in each cell and defining cellular state based on a weighted combination of both modalities, is introduced in Seurat v4. Infact, Multimodal analysis, or the simultaneous measurement of many modalities, is an intriguing new Frontier in single-cell genomics that needs novel computational methods to describe biological states based on numerous data sources. Recent research have demonstrated WNN to create a multimodal reference of human PBMC using a CITE-seq dataset with matched transcriptome and 228 surface protein measurements. WNN can be used to analyse multimodal data from several technologies, such as CITE-seq, ASAP-seq, 10X Genomics ATAC + RNA, and SHARE-seq (Ensslin, 2008) (Tables 8, 9).

TABLE 8

TABLE 8. Different omics levels of gene-function relationship.

TABLE 9

TABLE 9. Demonstrates various single cell sequencing technologies.

Deep Learning in Genomics

Although genomics generates large amounts of data, most bioinformatics algorithms use machine learning and, more recently, deep learning to discover patterns, make predictions, and model disease progression or treatment. Deep learning (DL) advances have sparked a surge of interest in biomedical informatics, spawning new bioinformatics and computational biology research areas. In deep learning models, it is anticipated to deliver higher accuracies in specific genomics tasks than current state-of-the-art methods. Given the growing trend of using deep learning architectures in genomics research. Deep learning will accelerate improvements in genomics. Deep learning is a sort of AI technique that is used to process vast and complicated genomic datasets in particular fields, such as clinical genomics (Koumakis, 2020). Various deep learning architectures have been designed till date, among them includes Artificial Neural Networks (ANN), Convolutional Neural Network (CNN) & Recurrent Neural Networks (RNN).

Artificial Neural Networks (ANN): The neurons and networks that make up human brains served as inspiration for Artificial Neural Networks (ANN). The ANN is made up of a set of fully linked nodes (neurons) that simulate the stimulus transmission of brain synapses across the neural network, whether they fire or not. These DL architectures can be used for feature selection, classification, dimensionality reduction, or as a submodule of a more complex design like convolutional neural networks (Zurada, 1992).

The Convolutional Neural Network (CNN) is a deep neural network architecture that is most typically used to analyse visual images. It was intended as a completely automated image analysis network for classifying handcrafted characters. CNNs are fully connected networks based on the multilayer perceptrons approach, in which each node/neuron in one layer is (fully) connected to all nodes in the following layer (LeCun et al., 1998).

Recurrent neural networks (RNN): The functioning of recurrent neural networks (RNN) is similar to that of normal feedforward neural networks (FNN), in which nodes form a directed graph along a temporal sequence. RNNs can now demonstrate temporal dynamic behavior while also integrating internal memory. Recurrent networks can remember information from previously studied states thanks to their short-term memory, making them ideal for sequential signal processing and prediction models. The ability of RNNs to relate information from a previous activity to the current task is one of their strengths (Williams and Zipser, 1989a). Table 10 enlists various tools of deep learning (AI) in genomics.

TABLE 10

TABLE 10. Shows list of deep learning techniques in genomics.

Conclusion and Future Perspectives

The introduction of massively parallel sequencing has changed genetics and genomics research forever because of its widespread adoption and numerous applications, massively parallel sequencing is projected to play a vital role in the medical industry in the next years. It is worth noting that NGS as a research tool faces major challenges in terms of manufacturing, data management and downstream analysis.

➢ Thus, in the past decade, rapid advancements in high-throughput intervention, backed by lower costs, have opened up new pathways for interrogating a biological system at several regulatory levels, while also providing us with an unprecedented picture. Integrating more genomic/proteome/transcriptome/metabolome/epigenome data with relevant information obtained at other levels, such as genomes, transcriptomes, epigenomics and metabolomics is still a difficulty.

➢ Nonetheless, new sequencing technologies addressing genomic, proteome, transcriptome, metabolome, and epigenome data clearly have tremendous research potential; their capabilities in the hands of researchers will surely speed our understanding of genomic, medical science and allied domains.

➢ Advances in data creation and analysis skills, as well as the interpretation of outcomes, have pointed to a bright future. However, rapid advancement in all fields of science has resulted in the introduction of novel analytical methodologies. While we continue to learn more about how the body functions, we should shift our focus from molecular to systemic and analytic techniques, which has the potential to revolutionize our understanding of how complex biological systems are regulated.

➢ Data integration, on the other hand, is not the end. Although, the bioinformatics challenges posed by NGS are significant, a variety of software tools and algorithms have been created to aid data management, short-read alignment, and sequence variant identification. The high throughput of NGS necessitates the use of automated pipelines, which aid in the transition from novel sequencing technology

➢ Thus the scenario emphasizes the necessity for scientists who are experts in a variety of fields, as well as the effectiveness of multidisciplinary research groups, in which the complementarity of varied abilities will allow for significant scientific advancements & contributions. Addressing system-wide biological concerns necessitates the use of integrated biology techniques. Routine integration, on the other hand, will necessitate the maturation and alignment of various post-genome technologies, as well as cross-communication across various scientific communities. The effective integration of all of these technologies will eventually lead to next-generation systems biology, which will provide valuable biological insights and adoption to high-throughput research and publication.

Author Contributions

DA and GRB conceived the concept. GRB, IS, and DA wrote the manuscript. BR and RK technically refined the MS. All the authors finally approved the MS.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Addona, T. A., Shi, X., Keshishian, H., Mani, D. R., Burgess, M., Gillette, M. A., et al. (2011). A Pipeline that Integrates the Discovery and Verification of Plasma Protein Biomarkers Reveals Candidate Markers for Cardiovascular Disease. Nat. Biotechnol. 29 (7), 635–643. doi:10.1038/nbt.1899

PubMed Abstract | CrossRef Full Text | Google Scholar

Adzhubei, I. A., Schmidt, S., Peshkin, L., Ramensky, V. E., Gerasimova, A., Bork, P., et al. (2010). A Method and Server for Predicting Damaging Missense Mutations. Nat. Methods 7 (4), 248–249. doi:10.1038/nmeth0410-248

PubMed Abstract | CrossRef Full Text | Google Scholar

Aird, D., Ross, M. G., Chen, W. S., Danielsson, M., Fennell, T., Russ, C., et al. (2011). Analyzing and Minimizing PCR Amplification Bias in Illumina Sequencing Libraries. Genome Biol. 12 (2), R18–R14. doi:10.1186/gb-2011-12-2-r18

PubMed Abstract | CrossRef Full Text | Google Scholar

Altelaar, A. F. M., Navarro, D., Boekhorst, J., van Breukelen, B., Snel, B., Mohammed, S., et al. (2012). Database Independent Proteomics Analysis of the Ostrich and Human Proteome. Proc. Natl. Acad. Sci. U.S.A. 109 (2), 407–412. doi:10.1073/pnas.1108399108

PubMed Abstract | CrossRef Full Text | Google Scholar

Altmann, A., Weber, P., Bader, D., Preuß, M., Binder, E. B., and Müller-Myhsok, B. (2012). A Beginners Guide to SNP Calling from High-Throughput DNA-Sequencing Data. Hum. Genet. 131 (10), 1541–1554. doi:10.1007/s00439-012-1213-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Amaral, A., Reis, M., and Silva, F. (2007). O programa BLAST: guia prático de utilização. Lisboa, Portugal: EMBRAPA. Documentos, 224.

Google Scholar

Ameur, A., Che, H., Martin, M., Bunikis, I., Dahlberg, J., Höijer, I., et al. (2018). De Novo assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data. Genes 9 (10), 486. doi:10.3390/genes9100486

PubMed Abstract | CrossRef Full Text | Google Scholar

Ameur, A., Kloosterman, W. P., and Hestand, M. S. (2019). Single-molecule Sequencing: towards Clinical Applications. Trends Biotechnology 37 (1), 72–85. doi:10.1016/j.tibtech.2018.07.013

PubMed Abstract | CrossRef Full Text | Google Scholar

Andrés-León, E., Núñez-Torres, R., and Rojas, A. M. (2016). miARma-Seq: a Comprehensive Tool for miRNA, mRNA and circRNA Analysis. Scientific Rep. 6 (1), 1–8.

Google Scholar

Angermueller, C., Lee, H. J., Reik, W., and Stegle, O. (2017). Erratum to: DeepCpG: Accurate Prediction of Single-Cell DNA Methylation States Using Deep Learning. Genome Biol. 18 (1), 90–13. doi:10.1186/s13059-017-1233-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Aryee, M. J., Jaffe, A. E., Corrada-Bravo, H., Ladd-Acosta, C., Feinberg, A. P., Hansen, K. D., et al. (2014). Minfi: a Flexible and Comprehensive Bioconductor Package for the Analysis of Infinium DNA Methylation Microarrays. Bioinformatics 30 (10), 1363–1369. doi:10.1093/bioinformatics/btu049

PubMed Abstract | CrossRef Full Text | Google Scholar

Azarkhalili, B., Saberi, A., Chitsaz, H., and Sharifi-Zarchi, A. (2019). DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome. Sci. Rep. 9 (1), 16526–16614. doi:10.1038/s41598-019-52937-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Baker, M. (2012). The Interaction Map. Nature 484 (7393), 271–275. doi:10.1038/484271a

PubMed Abstract | CrossRef Full Text | Google Scholar

Bao, L., Zhou, M., and Cui, Y. (2005). nsSNPAnalyzer: Identifying Disease-Associated Nonsynonymous Single Nucleotide Polymorphisms. Nucleic Acids Res. 33 (Suppl. l_2), W480–W482. doi:10.1093/nar/gki372

PubMed Abstract | CrossRef Full Text | Google Scholar

Beck, M., Schmidt, A., Malmstroem, J., Claassen, M., Ori, A., Szymborska, A., et al. (2011). The Quantitative Proteome of a Human Cell Line. Mol. Syst. Biol. 7 (1), 549. doi:10.1038/msb.2011.82

PubMed Abstract | CrossRef Full Text | Google Scholar

Berlin, K., Koren, S., Chin, C.-S., Drake, J. P., Landolin, J. M., and Phillippy, A. M. (2015). Assembling Large Genomes with Single-Molecule Sequencing and Locality-Sensitive Hashing. Nat. Biotechnol. 33 (6), 623–630. doi:10.1038/nbt.3238

PubMed Abstract | CrossRef Full Text | Google Scholar

Berman, H., Henrick, K., Nakamura, H., and Markley, J. L. (2007). The Worldwide Protein Data Bank (wwPDB): Ensuring a Single, Uniform Archive of PDB Data. Nucleic Acids Res. 35, D301–D303. doi:10.1093/nar/gkl971

PubMed Abstract | CrossRef Full Text | Google Scholar

Bernhofer, M., Dallago, C., Karl, T., Satagopam, V., Heinzinger, M., Littmann, M., et al. (2021). PredictProtein - Predicting Protein Structure and Function for 29 Years. Nucleic Acids Res. 49 (W1), W535–W540. doi:10.1093/nar/gkab354

PubMed Abstract | CrossRef Full Text | Google Scholar

Blencowe, B. J. (2000). Exonic Splicing Enhancers: Mechanism of Action, Diversity and Role in Human Genetic Diseases. Trends Biochemical Sciences 25 (3), 106–110. doi:10.1016/s0968-0004(00)01549-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Boža, V., Brejová, B., and Vinař, T. (2017). DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads. PloS one 12 (6), e0178751. doi:10.1371/journal.pone.0178751

PubMed Abstract | CrossRef Full Text | Google Scholar

Brierley, A. S., Fernandes, P. G., Brandon, M. A., Armstrong, F., Millard, N. W., McPhail, S. D., et al. (2002). Antarctic Krill under Sea Ice: Elevated Abundance in a Narrow Band Just South of Ice Edge. Science 295 (5561), 1890–1892. doi:10.1126/science.1068574

PubMed Abstract | CrossRef Full Text | Google Scholar

Bru, C., Courcelle, E., Carrère, S., Beausse, Y., Dalmar, S., and Kahn, D. (2005). The ProDom Database of Protein Domain Families: More Emphasis on 3D. Nucleic Acids Res. 33, D212–D215. doi:10.1093/nar/gki034

PubMed Abstract | CrossRef Full Text | Google Scholar

Burge, C. B., Tuschl, T., and Sharp, P. A. (1999). Splicing of Precursors to mRNAs by the Spliceosomes. Cold Spring Harbor Monogr. Ser. 37, 525–560.

Google Scholar

Caldwell, M. D., Berg, R. L., Zhang, K. Q., Glurich, I., Schmelzer, J. R., Yale, S. H., et al. (2007). Evaluation of Genetic Factors for Warfarin Dose Prediction. Clin. Med. Res. 5 (1), 8–16. doi:10.3121/cmr.2007.724

PubMed Abstract | CrossRef Full Text | Google Scholar

Carlson, A. K., Rawle, R. A., Adams, E., Greenwood, M. C., Bothner, B., and June, R. K. (2018). Application of Global Metabolomic Profiling of Synovial Fluid for Osteoarthritis Biomarkers. Biochem. biophysical Res. Commun. 499 (2), 182–188. doi:10.1016/j.bbrc.2018.03.117

CrossRef Full Text | Google Scholar

Cartegni, L., Chew, S. L., and Krainer, A. R. (2002). Listening to Silence and Understanding Nonsense: Exonic Mutations that Affect Splicing. Nat. Rev. Genet. 3 (4), 285–298. doi:10.1038/nrg775

PubMed Abstract | CrossRef Full Text | Google Scholar

Cartegni, L., Wang, J., Zhu, Z., Zhang, M. Q., and Krainer, A. R. (2003). ESEfinder: A Web Resource to Identify Exonic Splicing Enhancers. Nucleic Acids Res. 31 (13), 3568–3571. doi:10.1093/nar/gkg616

PubMed Abstract | CrossRef Full Text | Google Scholar

Chaisson, M. J., and Tesler, G. (2012). Mapping Single Molecule Sequencing Reads Using Basic Local Alignment with Successive Refinement (BLASR): Application and Theory. BMC bioinformatics 13 (1), 238–318. doi:10.1186/1471-2105-13-238

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, L., Wang, C., Sun, H., Wang, J., Liang, Y., Wang, Y., et al. (2021). The Bioinformatics Toolbox for circRNA Discovery and Analysis. Brief. Bioinformatics 22 (2), 1706–1728. doi:10.1093/bib/bbaa001

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, Y., Li, Y., Narayan, R., Subramanian, A., and Xie, X. (2016). Gene Expression Inference with Deep Learning. Bioinformatics 32 (12), 1832–1839. doi:10.1093/bioinformatics/btw074

PubMed Abstract | CrossRef Full Text | Google Scholar

Chin, C.-S., Alexander, D. H., Marks, P., Klammer, A. A., Drake, J., Heiner, C., et al. (2013). Nonhybrid, Finished Microbial Genome Assemblies from Long-Read SMRT Sequencing Data. Nat. Methods 10 (6), 563–569. doi:10.1038/nmeth.2474

PubMed Abstract | CrossRef Full Text | Google Scholar

Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., et al. (2012). A Program for Annotating and Predicting the Effects of Single Nucleotide Polymorphisms, SnpEff. Fly 6 (2), 80–92. doi:10.4161/fly.19695

PubMed Abstract | CrossRef Full Text | Google Scholar

Consortium, I. H., Altshuler, D. M., Gibbs, R. A., Peltonen, L., Altshuler, D. M., Gibbs, R. A., et al. (2010). Integrating Common and Rare Genetic Variation in Diverse Human Populations. Nature 467 (7311), 52–58. doi:10.1038/nature09298

PubMed Abstract | CrossRef Full Text | Google Scholar

Cooper, C. A., Gasteiger, E., and Packer, N. H. (2001). GlycoMod - A Software Tool for Determining Glycosylation Compositions from Mass Spectrometric Data. Proteomics 1 (2), 340–349. doi:10.1002/1615-9861(200102)1:2<340::aid-prot340>3.0.co;2-b

PubMed Abstract | CrossRef Full Text | Google Scholar

Cox, J., and Mann, M. (2011). Quantitative, High-Resolution Proteomics for Data-Driven Systems Biology. Annu. Rev. Biochem. 80, 273–299. doi:10.1146/annurev-biochem-061308-093216

PubMed Abstract | CrossRef Full Text | Google Scholar

David, M., Dursi, L. J., Yao, D., Boutros, P. C., and Simpson, J. T. (2017). Nanocall: an Open Source Basecaller for Oxford Nanopore Sequencing Data. Bioinformatics 33 (1), 49–55. doi:10.1093/bioinformatics/btw569

PubMed Abstract | CrossRef Full Text | Google Scholar

De Castro, E., Sigrist, C. J., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P. S., Gasteiger, E., et al. (2006). ScanProsite: Detection of PROSITE Signature Matches and ProRule-Associated Functional and Structural Residues in Proteins. Nucleic Acids Res. 34 (Suppl. l_2), W362–W365. doi:10.1093/nar/gkl124

PubMed Abstract | CrossRef Full Text | Google Scholar

Delaneau, O., Zagury, J. F., Robinson, M. R., Marchini, J. L., and Dermitzakis, E. T. (2019). Accurate, Scalable and Integrative Haplotype Estimation. Nat. Commun. 10 (1), 5436–5510. doi:10.1038/s41467-019-13225-y

PubMed Abstract | CrossRef Full Text | Google Scholar

DeLano, W. L. (2002). PyMOL.

Dombi, J., Jónás, T., and Tóth, Z. E. (2017). “A Pliant Arithmetic-Based Fuzzy Time Series Model,” in International Work-Conference on Artificial Neural Networks (Springer).

CrossRef Full Text | Google Scholar

Dudka, I., Chachaj, A., Sebastian, A., Tański, W., Stenlund, H., Gröbner, G., et al. (2021). Metabolomic Profiling Reveals Plasma GlycA and GlycB as a Potential Biomarkers for Treatment Efficiency in Rheumatoid Arthritis. J. Pharm. Biomed. Anal. 197, 113971. doi:10.1016/j.jpba.2021.113971

PubMed Abstract | CrossRef Full Text | Google Scholar

Eilbeck, K., Quinlan, A., and Yandell, M. (2017). Settling the Score: Variant Prioritization and Mendelian Disease. Nat. Rev. Genet. 18 (10), 599–612. doi:10.1038/nrg.2017.52

PubMed Abstract | CrossRef Full Text | Google Scholar

English, A. C., Richards, S., Han, Y., Wang, M., Vee, V., Qu, J., et al. (2012). Mind the gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology. PloS one 7 (11), e47768. doi:10.1371/journal.pone.0047768

PubMed Abstract | CrossRef Full Text | Google Scholar

Ensslin, A. (2008). Introduction to Multimodal Analysis by David Machin. Wiley Online Library.

Google Scholar

Fairbrother, W. G., Yeh, R.-F., Sharp, P. A., and Burge, C. B. (2002). Predictive Identification of Exonic Splicing Enhancers in Human Genes. Science 297 (5583), 1007–1013. doi:10.1126/science.1073774

PubMed Abstract | CrossRef Full Text | Google Scholar

Ficarro, S. B., McCleland, M. L., Stukenberg, P. T., Burke, D. J., Ross, M. M., Shabanowitz, J., et al. (2002). Phosphoproteome Analysis by Mass Spectrometry and its Application to Saccharomyces cerevisiae. Nat. Biotechnol. 20 (3), 301–305. doi:10.1038/nbt0302-301

PubMed Abstract | CrossRef Full Text | Google Scholar

Flicek, P., and Birney, E. (2009). Sense from Sequence Reads: Methods for Alignment and Assembly. Nat. Methods 6 (11), S6–S12. doi:10.1038/nmeth.1376

PubMed Abstract | CrossRef Full Text | Google Scholar

Flicek, P., Aken, B. L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., et al. (2008). Ensembl 2008. Nucleic Acids Research 36 (Database issue), D707–D714. doi:10.1093/nar/gkm988

PubMed Abstract | CrossRef Full Text | Google Scholar

Freshour, S. L., Kiwala, S., Cotto, K. C., Coffman, A. C., McMichael, J. F., Song, J. J., et al. (2020). Integration of the Drug-Gene Interaction Database (DGIdb 4.0) with Open Crowdsource Efforts. Nucleic Acids Res. 49 (D1), D1144–D1151. doi:10.1093/nar/gkaa1084

PubMed Abstract | CrossRef Full Text | Google Scholar

Gao, Y., Wang, J., and Zhao, F. (2015). CIRI: an Efficient and Unbiased Algorithm for De Novo Circular RNA Identification. Genome Biol. 16 (1), 4–16. doi:10.1186/s13059-014-0571-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Gao, Y., Zhang, J., and Zhao, F. (2018). Circular RNA Identification Based on Multiple Seed Matching. Brief. Bioinformatics 19 (5), 803–810. doi:10.1093/bib/bbx014

PubMed Abstract | CrossRef Full Text | Google Scholar

Gasteiger, E., Alexandre, G., Christine, H., Ivan, I., Ron, D. A., and Amos, B. (2003). ExPASy: The Proteomics Server for In-Depth Protein Knowledge and Analysis. Nucleic Acids Res. 31 (13), 3784–3788. doi:10.1093/nar/gkg563

PubMed Abstract | CrossRef Full Text | Google Scholar

Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S. e., Wilkins, M. R., Appel, R. D., et al. (2005). Protein Identification and Analysis Tools on the ExPASy serverThe Proteomics Protocols Handbook, 571–607. doi:10.1385/1-59259-890-0:571Protein Identification and Analysis Tools on the ExPASy Server

CrossRef Full Text | Google Scholar

Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., et al. (2004). Bioconductor: Open Software Development for Computational Biology and Bioinformatics. Genome Biol. 5 (10), R80–R16. doi:10.1186/gb-2004-5-10-r80

PubMed Abstract | CrossRef Full Text | Google Scholar

González-Pérez, A., and López-Bigas, N. (2011). Improving the Assessment of the Outcome of Nonsynonymous SNVs with a Consensus Deleteriousness Score, Condel. Am. J. Hum. Genet. 88 (4), 440–449.

PubMed Abstract | Google Scholar

Goodwin, S., Gurtowski, J., Ethe-Sayers, S., Deshpande, P., Schatz, M. C., and McCombie, W. R. (2015). Oxford Nanopore Sequencing, Hybrid Error Correction, and De Novo Assembly of a Eukaryotic Genome. Genome Res. 25 (11), 1750–1756. doi:10.1101/gr.191395.115

PubMed Abstract | CrossRef Full Text | Google Scholar

Graveley, B. R. (2000). Sorting Out the Complexity of SR Protein Functions. Rna 6 (9), 1197–1211. doi:10.1017/s1355838200000960

PubMed Abstract | CrossRef Full Text | Google Scholar

Guo, F., Li, L., Li, J., Wu, X., Hu, B., Zhu, P., et al. (2017). Single-cell Multi-Omics Sequencing of Mouse Early Embryos and Embryonic Stem Cells. Cell Res 27 (8), 967–988. doi:10.1038/cr.2017.82

PubMed Abstract | CrossRef Full Text | Google Scholar

Gupta, A., Wang, H., and Ganapathiraju, M. (20152015). “Learning Structure in Gene Expression Data Using Deep Architectures, with an Application to Gene Clustering,” in IEEE international conference on bioinformatics and biomedicine (BIBM) (IEEE).

CrossRef Full Text | Google Scholar

Gupta, A., and Zou, J. (2019). Feedback GAN for DNA Optimizes Protein Functions. Nat. Mach Intell. 1 (2), 105–111. doi:10.1038/s42256-019-0017-4

CrossRef Full Text | Google Scholar

Habib, N., Avraham-Davidi, I., Basu, A., Burks, T., Shekhar, K., Hofree, M., et al. (2017). Massively Parallel Single-Nucleus RNA-Seq with DroNc-Seq. Nat. Methods 14 (10), 955–958. doi:10.1038/nmeth.4407

PubMed Abstract | CrossRef Full Text | Google Scholar

Han, X., Wang, R., Zhou, Y., Fei, L., Sun, H., Lai, S., et al. (2018). Mapping the Mouse Cell Atlas by Microwell-Seq. Cell 172 (5), 1091–1107. doi:10.1016/j.cell.2018.02.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Harper, P. S. (2017). The European Society of Human Genetics: Beginnings, Early History and Development over its First 25 Years. United Kingdom: European Journal of Human Genetics, 1–8.

Google Scholar

Herráez, A. (2006). Biomolecules in the Computer: Jmol to the rescue. Biochem. Mol. Biol. Educ. 34 (4), 255–261. doi:10.1002/bmb.2006.494034042644

PubMed Abstract | CrossRef Full Text | Google Scholar

Hoffman, G. E., Bendl, J., Girdhar, K., Schadt, E. E., and Roussos, P. (2019). Functional Interpretation of Genetic Variants Using Deep Learning Predicts Impact on Chromatin Accessibility and Histone Modification. Nucleic Acids Res. 47 (20), 10597–10611. doi:10.1093/nar/gkz808

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, D. W., Sherman, B. T., and Lempicki, R. A. (2009). Bioinformatics Enrichment Tools: Paths toward the Comprehensive Functional Analysis of Large Gene Lists. Nucleic Acids Res. 37 (1), 1–13. doi:10.1093/nar/gkn923

PubMed Abstract | CrossRef Full Text | Google Scholar

Ishikawa, J., and Hotta, K. (1999). FramePlot: a New Implementation of the Frame Analysis for Predicting Protein-Coding Regions in Bacterial DNA with a High G+C Content. FEMS Microbiol. Lett. 174 (2), 251–253. doi:10.1111/j.1574-6968.1999.tb13576.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Jackson, D. A., Symons, R. H., and Berg, P. (1972). Biochemical Method for Inserting New Genetic Information into DNA of Simian Virus 40: Circular SV40 DNA Molecules Containing Lambda Phage Genes and the Galactose Operon of Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 69 (10), 2904–2909. doi:10.1073/pnas.69.10.2904

PubMed Abstract | CrossRef Full Text | Google Scholar

Jacob, M., and Gallinaro, H. (1989). The 5′ Splice Site: Phylogetic Evalution and Variable Geometry of Association with U1RNA. Nucl. Acids Res. 17 (6), 2159–2180. doi:10.1093/nar/17.6.2159

PubMed Abstract | CrossRef Full Text | Google Scholar

Jensen, L. J., Saric, J., and Bork, P. (2006). Literature Mining for the Biologist: from Information Retrieval to Biological Discovery. Nat. Rev. Genet. 7 (2), 119–129. doi:10.1038/nrg1768

PubMed Abstract | CrossRef Full Text | Google Scholar

Jia, G.-y., Wang, D.-l., Xue, M.-z., Liu, Y.-w., Pei, Y.-c., Yang, Y.-q., et al. (2019). CircRNAFisher: a Systematic Computational Approach for De Novo Circular RNA Identification. Acta Pharmacol. Sin 40 (1), 55–63. doi:10.1038/s41401-018-0063-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N., and Sternberg, M. J. E. (2015). The Phyre2 Web portal for Protein Modeling, Prediction and Analysis. Nat. Protoc. 10 (6), 845–858. doi:10.1038/nprot.2015.053

PubMed Abstract | CrossRef Full Text | Google Scholar

Kircher, M., Heyn, P., and Kelso, J. (2011). Addressing Challenges in the Production and Analysis of Illumina Sequencing Data. BMC genomics 12 (1), 382–414. doi:10.1186/1471-2164-12-382

PubMed Abstract | CrossRef Full Text | Google Scholar

Kircher, M., Witten, D. M., Jain, P., O'Roak, B. J., Cooper, G. M., and Shendure, J. (2014). A General Framework for Estimating the Relative Pathogenicity of Human Genetic Variants. Nat. Genet. 46 (3), 310–315. doi:10.1038/ng.2892

PubMed Abstract | CrossRef Full Text | Google Scholar

Koumakis, L. (2020). Deep Learning Models in Genomics; Are We There yet? Comput. Struct. Biotechnol. J. 18, 1466–1473. doi:10.1016/j.csbj.2020.06.017

PubMed Abstract | CrossRef Full Text | Google Scholar

Kuhn, M., von Mering, C., Campillos, M., Jensen, L. J., and Bork, P. (2008). STITCH: Interaction Networks of Chemicals and Proteins. Nucleic Acids Res. 36, D684–D688. doi:10.1093/nar/gkm795

PubMed Abstract | CrossRef Full Text | Google Scholar

Kuhn, M., Letunic, I., Jensen, L. J., and Bork, P. (2016). The SIDER Database of Drugs and Side Effects. Nucleic Acids Res. 44 (D1), D1075–D1079. doi:10.1093/nar/gkv1075

PubMed Abstract | CrossRef Full Text | Google Scholar

Lacal, I., and Ventura, R. (2018). Epigenetic Inheritance: Concepts, Mechanisms and Perspectives. Front. Mol. Neurosci. 11, 292. doi:10.3389/fnmol.2018.00292

PubMed Abstract | CrossRef Full Text | Google Scholar

Lander, E. S. (2011). Initial Impact of the Sequencing of the Human Genome. Nature 470 (7333), 187–197. doi:10.1038/nature09792

PubMed Abstract | CrossRef Full Text | Google Scholar

Laufer, B. I., Hwang, H., Jianu, J. M., Mordaunt, C. E., Korf, I. F., Hertz-Picciotto, I., et al. (2020). Low-pass Whole Genome Bisulfite Sequencing of Neonatal Dried Blood Spots Identifies a Role for RUNX1 in Down Syndrome DNA Methylation Profiles. Hum. Mol. Genet. 29 (21), 3465–3476. doi:10.1093/hmg/ddaa218

PubMed Abstract | CrossRef Full Text | Google Scholar

Laufer, B. I., Neier, K., Valenzuela, A. E., Yasui, D. H., Schmidt, R. J., Lein, P. J., et al. (2022). Placenta and Fetal Brain Share a Neurodevelopmental Disorder DNA Methylation Profile in a Mouse Model of Prenatal PCB Exposure. Cel Rep. 38 (9), 110442. doi:10.1016/j.celrep.2022.110442

CrossRef Full Text | Google Scholar

Laurentino, S., Heckmann, L., Di Persio, S., Li, X., Meyer Zu Hörste, G., Wistuba, J., et al. (2019). High-resolution Analysis of Germ Cells from Men with Sex Chromosomal Aneuploidies Reveals normal Transcriptome but Impaired Imprinting. Clin. Epigenetics 11 (1), 127–213. doi:10.1186/s13148-019-0720-3

PubMed Abstract | CrossRef Full Text | Google Scholar

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based Learning Applied to Document Recognition. Proc. IEEE 86 (11), 2278–2324. doi:10.1109/5.726791

CrossRef Full Text | Google Scholar

Ledergerber, C., and Dessimoz, C. (2011). Base-calling for Next-Generation Sequencing Platforms. Brief. Bioinformatics 12 (5), 489–497. doi:10.1093/bib/bbq077

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, B. (2016). “deepTarget: End-To-End Learning Framework for microRNA Target Prediction Using Deep Recurrent Neural Networks,” in Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics.

Google Scholar

Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T., et al. (2016). Analysis of Protein-Coding Genetic Variation in 60,706 Humans. Nature 536 (7616), 285–291. doi:10.1038/nature19057

PubMed Abstract | CrossRef Full Text | Google Scholar

Lelieveld, S. H., Veltman, J. A., and Gilissen, C. (2016). Novel Bioinformatic Developments for Exome Sequencing. Hum. Genet. 135 (6), 603–614. doi:10.1007/s00439-016-1658-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Lemeer, S., and Heck, A. J. (2009). The Phosphoproteomics Data Explosion. Curr. Opin. Chem. Biol. 13 (4), 414–420. doi:10.1016/j.cbpa.2009.06.022

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, H. (2016). Minimap and Miniasm: Fast Mapping and De Novo Assembly for Noisy Long Sequences. Bioinformatics 32 (14), 2103–2110. doi:10.1093/bioinformatics/btw152

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, X., and Wu, Y. (2020). Detecting circular RNA from high-throughput sequence data with de Bruijn graph. BMC genomics 21 (1), 749–811. doi:10.1186/s12864-019-6154-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Lienhard, M., Grimm, C., Morkel, M., Herwig, R., and Chavez, L. (2014). MEDIPS: Genome-wide Differential Coverage Analysis of Sequencing Data Derived from DNA Enrichment Experiments. Bioinformatics 30 (2), 284–286. doi:10.1093/bioinformatics/btt650

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, H.-X., Chew, S. L., Cartegni, L., Zhang, M. Q., and Krainer, A. R. (2000). Exonic Splicing Enhancer Motif Recognized by Human SC35 under Splicing Conditions. Mol. Cel Biol 20 (3), 1063–1071. doi:10.1128/mcb.20.3.1063-1071.2000

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, H. X., Zhang, M., and Krainer, A. R. (1998). Identification of Functional Exonic Splicing Enhancer Motifs Recognized by Individual SR Proteins. Genes Dev. 12 (13), 1998–2012. doi:10.1101/gad.12.13.1998

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Q., Ding, C., Lang, X., Guo, G., Chen, J., and Su, X. (2021). Small Noncoding RNA Discovery and Profiling with sRNAtools Based on High-Throughput Sequencing. Brief. Bioinformatics 22 (1), 463–473. doi:10.1093/bib/bbz151

CrossRef Full Text | Google Scholar

Loman, N. J., Quick, J., and Simpson, J. T. (2015). A Complete Bacterial Genome Assembled De Novo Using Only Nanopore Sequencing Data. Nat. Methods 12 (8), 733–735. doi:10.1038/nmeth.3444

PubMed Abstract | CrossRef Full Text | Google Scholar

Lopez, J. P., Diallo, A., Cruceanu, C., Fiori, L. M., Laboissiere, S., Guillet, I., et al. (2015). Biomarker Discovery: Quantification of microRNAs and Other Small Non-coding RNAs Using Next Generation Sequencing. BMC Med. Genomics 8 (1), 35–18. doi:10.1186/s12920-015-0109-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Lundby, A., Secher, A., Lage, K., Nordsborg, N. B., Dmytriyev, A., Lundby, C., et al. (2012). Quantitative Maps of Protein Phosphorylation Sites across 14 Different Rat Organs and Tissues. Nat. Commun. 3 (1), 876–910. doi:10.1038/ncomms1871

PubMed Abstract | CrossRef Full Text | Google Scholar

Luscombe, N. M., Greenbaum, D., and Gerstein, M. (2001). What Is Bioinformatics? an Introduction and Overview. Yearb. Med. Inform. 10 (01), 83–100. doi:10.1055/s-0038-1638103

CrossRef Full Text | Google Scholar

Ma, B., and Johnson, R. (2012). De Novo sequencing and Homology Searching. Mol. Cel Proteomics 11 (2), O111–O014902. doi:10.1074/mcp.O111.014902

CrossRef Full Text | Google Scholar

MacArthur, D. G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K., et al. (2012). A Systematic Survey of Loss-Of-Function Variants in Human Protein-Coding Genes. Science 335 (6070), 823–828. doi:10.1126/science.1215040

PubMed Abstract | CrossRef Full Text | Google Scholar

Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M., and Gilad, Y. (2008). RNA-seq: an Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays. Genome Res. 18 (9), 1509–1517. doi:10.1101/gr.079558.108

PubMed Abstract | CrossRef Full Text | Google Scholar

Martin, T. C. (2014). The coMET User Guide.

Google Scholar

Maxam, A. M., and Gilbert, W. (1977). A New Method for Sequencing DNA. Proc. Natl. Acad. Sci. U.S.A. 74 (2), 560–564. doi:10.1073/pnas.74.2.560

PubMed Abstract | CrossRef Full Text | Google Scholar

McCarthy, A. (2010). Third Generation DNA Sequencing: pacific Biosciences' Single Molecule Real Time Technology. Chem. Biol. 17 (7), 675–676. doi:10.1016/j.chembiol.2010.07.004

PubMed Abstract | CrossRef Full Text | Google Scholar

McLaren, W., Pritchard, B., Rios, D., Chen, Y., Flicek, P., and Cunningham, F. (2010). Deriving the Consequences of Genomic Variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26 (16), 2069–2070. doi:10.1093/bioinformatics/btq330

PubMed Abstract | CrossRef Full Text | Google Scholar

Meng, X., Chen, Q., Zhang, P., and Chen, M. (2017). CircPro: an Integrated Tool for the Identification of circRNAs with Protein-Coding Potential. Bioinformatics 33 (20), 3314–3316. doi:10.1093/bioinformatics/btx446

PubMed Abstract | CrossRef Full Text | Google Scholar

Merker, J. D., Wenger, A. M., Sneddon, T., Grove, M., Zappala, Z., Fresard, L., et al. (2018). Long-read Genome Sequencing Identifies Causal Structural Variation in a Mendelian Disease. Genet. Med. 20 (1), 159–163. doi:10.1038/gim.2017.86

PubMed Abstract | CrossRef Full Text | Google Scholar

Montgomery, S. B., Sammeth, M., Gutierrez-Arcelus, M., Lach, R. P., Ingle, C., Nisbett, J., et al. (2010). Transcriptome Genetics Using Second Generation Sequencing in a Caucasian Population. Nature 464 (7289), 773–777. doi:10.1038/nature08903

PubMed Abstract | CrossRef Full Text | Google Scholar

Müller, F., Scherer, M., Assenov, Y., Lutsik, P., Walter, J., Lengauer, T., et al. (2019). RnBeads 2.0: Comprehensive Analysis of DNA Methylation Data. Genome Biol. 20 (1), 55. doi:10.1186/s13059-019-1664-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Nagaraj, N., Wisniewski, J. R., Geiger, T., Cox, J., Kircher, M., Kelso, J., et al. (2011). Deep Proteome and Transcriptome Mapping of a Human Cancer Cell Line. Mol. Syst. Biol. 7 (1), 548. doi:10.1038/msb.2011.81

PubMed Abstract | CrossRef Full Text | Google Scholar

Ng, P. C., and Henikoff, S. (2003). SIFT: Predicting Amino Acid Changes that Affect Protein Function. Nucleic Acids Res. 31 (13), 3812–3814. doi:10.1093/nar/gkg509

PubMed Abstract | CrossRef Full Text | Google Scholar

Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., et al. (2009). Targeted Capture and Massively Parallel Sequencing of 12 Human Exomes. Nature 461 (7261), 272–276. doi:10.1038/nature08250

PubMed Abstract | CrossRef Full Text | Google Scholar

Nguyen, T. M., Shafi, A., Nguyen, T., and Draghici, S. (2019). Correction to: Identifying Significantly Impacted Pathways: a Comprehensive Review and Assessment. Genome Biol. 20 (1), 234–315. doi:10.1186/s13059-019-1882-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Nilsen, T. W. (2003). The Spliceosome: the Most Complex Macromolecular Machine in the Cell? Bioessays 25 (12), 1147–1149. doi:10.1002/bies.10394

PubMed Abstract | CrossRef Full Text | Google Scholar

Ozsolak, F. (2012). Third-generation Sequencing Techniques and Applications to Drug Discovery. Expert Opin. Drug Discov. 7 (3), 231–243. doi:10.1517/17460441.2012.660145

PubMed Abstract | CrossRef Full Text | Google Scholar

Paczkowska, M., Barenboim, J., Sintupisut, N., Fox, N. S., Zhu, H., Abd-Rabbo, D., et al. (2020). Integrative Pathway Enrichment Analysis of Multivariate Omics Data. Nat. Commun. 11 (1), 735–816. doi:10.1038/s41467-019-13983-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Park, S. (2016). deepMiRGene: Deep Neural Network Based Precursor Microrna Prediction. arXiv preprint arXiv:1605.00017.

Google Scholar

Pennisi, E. (2012). Single-cell Sequencing Tackles Basic and Biomedical Questions. American Association for the Advancement of Science.

Google Scholar

Pereira, R., Barbosa, T., Gales, L., Oliveira, E., Santos, R., Oliveira, J., et al. (2019). Clinical and Genetic Analysis of Children with Kartagener Syndrome. Cells 8 (8), 900. doi:10.3390/cells8080900

PubMed Abstract | CrossRef Full Text | Google Scholar

Pereira, R., Oliveira, J., and Sousa, M. (2020). Bioinformatics and Computational Tools for Next-Generation Sequencing Analysis in Clinical Genetics. Jcm 9 (1), 132. doi:10.3390/jcm9010132

PubMed Abstract | CrossRef Full Text | Google Scholar

Pereira, R., Oliveira, M. E., Santos, R., Oliveira, E., Barbosa, T., Santos, T., et al. (2019). Characterization of CCDC103 Expression Profiles: Further Insights in Primary Ciliary Dyskinesia and in Human Reproduction. J. Assist. Reprod. Genet. 36 (8), 1683–1700. doi:10.1007/s10815-019-01509-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Perez-Riverol, Y., Csordas, A., Bai, J., Bernal-Llinares, M., Hewapathirana, S., Kundu, D. J., et al. (2019). The PRIDE Database and Related Tools and Resources in 2019: Improving Support for Quantification Data. Nucleic Acids Res. 47 (D1), D442–d450. doi:10.1093/nar/gky1106

PubMed Abstract | CrossRef Full Text | Google Scholar

Peters, T. J., Buckley, M. J., Statham, A. L., Pidsley, R., Samaras, K., V Lord, R., et al. (2015). De Novo identification of Differentially Methylated Regions in the Human Genome. Epigenetics Chromatin 8 (1), 6–16. doi:10.1186/1756-8935-8-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Pevsner, J. (2015). Bioinformatics and Functional Genomics. John Wiley & Sons.

Google Scholar

Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R., and Siepel, A. (2010). Detection of Nonneutral Substitution Rates on Mammalian Phylogenies. Genome Res. 20 (1), 110–121. doi:10.1101/gr.097857.109

PubMed Abstract | CrossRef Full Text | Google Scholar

Poste, G. (2011). Bring on the Biomarkers. Nature 469 (7329), 156–157. doi:10.1038/469156a

PubMed Abstract | CrossRef Full Text | Google Scholar

Prosdocimi, F. (2010). Introdução à Bioinformática. Curso Online.

Prosdocimi, F., Cerqueira, G. C., Binneck, E., Silva, A. F., Reis, A. N., Junqueira, A. C. M., et al. (2002). Bioinformatics: User Manual - Biotechnology Science & Development.

Google Scholar

Pruess, M., and Apweiler, R. (2003). Bioinformatics Resources for In Silico Proteome Analysis. J. Biomed. Biotechnol. 2003 (4), 231–236. doi:10.1155/s1110724303209219

PubMed Abstract | CrossRef Full Text | Google Scholar

Pushkarev, D., Neff, N. F., and Quake, S. R. (2009). Single-molecule Sequencing of an Individual Human Genome. Nat. Biotechnol. 27 (9), 847–850. doi:10.1038/nbt.1561

PubMed Abstract | CrossRef Full Text | Google Scholar

Quail, M. A., Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R., et al. (2012). A Tale of Three Next Generation Sequencing Platforms: Comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq Sequencers. BMC genomics 13 (1), 341–413. doi:10.1186/1471-2164-13-341

PubMed Abstract | CrossRef Full Text | Google Scholar

Quang, D., and Xie, X. (2016). DanQ: a Hybrid Convolutional and Recurrent Deep Neural Network for Quantifying the Function of DNA Sequences. Nucleic Acids Res. 44 (11), e107. doi:10.1093/nar/gkw226

PubMed Abstract | CrossRef Full Text | Google Scholar

Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A., and Kim, D. (2015). Methods of Integrating Data to Uncover Genotype-Phenotype Interactions. Nat. Rev. Genet. 16 (2), 85–97. doi:10.1038/nrg3868

PubMed Abstract | CrossRef Full Text | Google Scholar

Robinson, P. N., Köhler, S., Oellrich, A., Wang, K., Mungall, C. J., Lewis, S. E., et al. (2014). Improved Exome Prioritization of Disease Genes through Cross-Species Phenotype Comparison. Genome Res. 24 (2), 340–348. doi:10.1101/gr.160325.113

PubMed Abstract | CrossRef Full Text | Google Scholar

Rosenberg, A. B., Roco, C. M., Muscat, R. A., Kuchina, A., Sample, P., Yao, Z., et al. (2018)., 360. New York, NY), 176–182. doi:10.1126/science.aam8999Single-cell Profiling of the Developing Mouse Brain and Spinal Cord with Split-Pool BarcodingScience6385

PubMed Abstract | CrossRef Full Text | Google Scholar

Sanger, F., and Coulson, A. R. (1975). A Rapid Method for Determining Sequences in DNA by Primed Synthesis with DNA Polymerase. J. Mol. Biol. 94 (3), 441–448. doi:10.1016/0022-2836(75)90213-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Sanger, F., Nicklen, S., and Coulson, A. R. (1977). DNA Sequencing with Chain-Terminating Inhibitors. Proc. Natl. Acad. Sci. U.S.A. 74 (12), 5463–5467. doi:10.1073/pnas.74.12.5463

PubMed Abstract | CrossRef Full Text | Google Scholar

Schadt, E. E., Turner, S., and Kasarskis, A. (2010). A Window into Third-Generation Sequencing. Hum. Mol. Genet. 19 (R2), R227–R240. doi:10.1093/hmg/ddq416

PubMed Abstract | CrossRef Full Text | Google Scholar

Scherer, S. W., Lee, C., Birney, E., Altshuler, D. M., Eichler, E. E., Carter, N. P., et al. (2007). Challenges and Standards in Integrating Surveys of Structural Variation. Nat. Genet. 39 (7), S7–S15. doi:10.1038/ng2093

PubMed Abstract | CrossRef Full Text | Google Scholar

Schmidt, A., Forne, I., and Imhof, A. (2014). Bioinformatic Analysis of Proteomics Data. BMC Syst. Biol. 8 Suppl 2 (2), S3–S7. doi:10.1186/1752-0509-8-S2-S3

PubMed Abstract | CrossRef Full Text | Google Scholar

Schwarz, J. M., Rödelsperger, C., Schuelke, M., and Seelow, D. (2010). MutationTaster Evaluates Disease-Causing Potential of Sequence Alterations. Nat. Methods 7 (8), 575–576. doi:10.1038/nmeth0810-575

PubMed Abstract | CrossRef Full Text | Google Scholar

Shah, N. J., Sureshkumar, S., and Shewade, D. G. (2015). Metabolomics: a Tool Ahead for Understanding Molecular Mechanisms of Drugs and Diseases. Ind. J. Clin. Biochem. 30 (3), 247–254. doi:10.1007/s12291-014-0455-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Shendure, J., and Ji, H. (2008). Next-generation DNA Sequencing. Nat. Biotechnol. 26 (10), 1135–1145. doi:10.1038/nbt1486

PubMed Abstract | CrossRef Full Text | Google Scholar

Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., et al. (2011). Fast, Scalable Generation of High‐quality Protein Multiple Sequence Alignments Using Clustal Omega. Mol. Syst. Biol. 7, 539. doi:10.1038/msb.2011.75

PubMed Abstract | CrossRef Full Text | Google Scholar

Sims, D., Sudbery, I., Ilott, N. E., Heger, A., and Ponting, C. P. (2014). Sequencing Depth and Coverage: Key Considerations in Genomic Analyses. Nat. Rev. Genet. 15 (2), 121–132. doi:10.1038/nrg3642

PubMed Abstract | CrossRef Full Text | Google Scholar

Singh Nanda, J., Kumar, R., and Raghava, G. P. (2016). dbEM: A Database of Epigenetic Modifiers Curated from Cancerous and normal Genomes. Sci. Rep. 6 (1), 19340–19346. doi:10.1038/srep19340

PubMed Abstract | CrossRef Full Text | Google Scholar

Singh, N., Upadhyay, S., Jaiswar, A., and Mishra, N. (2016a). In Silico Analysis of Protein. J. Bioinform Genomics Proteomics 1 (2), 1007.

Google Scholar

Singh, R., Lanchantin, J., Robins, G., and Qi, Y. (2016). DeepChrome: Deep-Learning for Predicting Gene Expression from Histone Modifications. Bioinformatics 32 (17), i639–i648. doi:10.1093/bioinformatics/btw427

PubMed Abstract | CrossRef Full Text | Google Scholar

Singleton, M. V., Guthery, S. L., Voelkerding, K. V., Chen, K., Kennedy, B., Margraf, R. L., et al. (2014). Phevor Combines Multiple Biomedical Ontologies for Accurate Identification of Disease-Causing Alleles in Single Individuals and Small Nuclear Families. Am. J. Hum. Genet. 94 (4), 599–610. doi:10.1016/j.ajhg.2014.03.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Siva, N. (2008). 1000 Genomes Project. Nat. Biotechnol. 26 (3), 256. doi:10.1038/nbt0308-256b

PubMed Abstract | CrossRef Full Text | Google Scholar

Smith, B. E., Hill, J. A., Gjukich, M. A., and Andrews, P. C. (2011). Tranche Distributed Repository and ProteomeCommons.Org. Methods Mol. Biol. 696, 123–145. doi:10.1007/978-1-60761-987-1_8

PubMed Abstract | CrossRef Full Text | Google Scholar

Smits, S. L., Raj, V. S., Oduber, M. D., Schapendonk, C. M. E., Bodewes, R., Provacia, L., et al. (2013). Metagenomic Analysis of the Ferret Fecal Viral flora. PLoS One 8 (8), e71595. doi:10.1371/journal.pone.0071595

PubMed Abstract | CrossRef Full Text | Google Scholar

Snider, C., Jayasinghe, S., Hristova, K., and White, S. H. (2009). MPEx: a Tool for Exploring Membrane Proteins. Protein Sci. 18 (12), 2624–2628. doi:10.1002/pro.256

PubMed Abstract | CrossRef Full Text | Google Scholar

Sović, I., ikić, I., Wilm, A., Fenlon, S. N., Chen, S., and Nagarajan, N. (2016). Fast and Sensitive Mapping of Nanopore Sequencing Reads with GraphMap. Nat. Commun. 7, 11307. doi:10.1038/ncomms11307

PubMed Abstract | CrossRef Full Text | Google Scholar

Stelzer, G., Plaschkes, I., Oz-Levi, D., Alkelai, A., Olender, T., Zimmerman, S., et al. (2016). VarElect: the Phenotype-Based Variation Prioritizer of the GeneCards Suite. BMC genomics 17 Suppl 2 (2), 444–206. doi:10.1186/s12864-016-2722-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Stephens, Z. D., Lee, S. Y., Faghri, F., Campbell, R. H., Zhai, C., Efron, M. J., et al. (2015). Big Data: Astronomical or Genomical? Plos Biol. 13 (7), e1002195. doi:10.1371/journal.pbio.1002195

PubMed Abstract | CrossRef Full Text | Google Scholar

Stitziel, N. O., Binkowski, T. A., Tseng, Y. Y., Kasif, S., and Liang, J. (2004). topoSNP: a Topographic Database of Non-synonymous Single Nucleotide Polymorphisms with and without Known Disease Association. Nucleic Acids Res. 32 (Suppl. l_1), D520–D522. doi:10.1093/nar/gkh104

PubMed Abstract | CrossRef Full Text | Google Scholar

Stoneking, M., and Krause, J. (2011). Learning about Human Population History from Ancient and Modern Genomes. Nat. Rev. Genet. 12 (9), 603–614. doi:10.1038/nrg3029

PubMed Abstract | CrossRef Full Text | Google Scholar

Stothard, P. (2000). The Sequence Manipulation Suite: JavaScript Programs for Analyzing and Formatting Protein and DNA Sequences. Biotechniques 28 (6), 1102–1104. doi:10.2144/00286ir01

PubMed Abstract | CrossRef Full Text | Google Scholar

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., et al. (2005). Gene Set Enrichment Analysis: a Knowledge-Based Approach for Interpreting Genome-wide Expression Profiles. Proc. Natl. Acad. Sci. U.S.A. 102 (43), 15545–15550. doi:10.1073/pnas.0506580102

PubMed Abstract | CrossRef Full Text | Google Scholar

Szalay, T., and Golovchenko, J. A. (2015). De Novo sequencing and Variant Calling with Nanopores Using PoreSeq. Nat. Biotechnol. 33 (10), 1087–1091. doi:10.1038/nbt.3360

PubMed Abstract | CrossRef Full Text | Google Scholar

Szklarczyk, D., Gable, A. L., Nastou, K. C., Lyon, D., Kirsch, R., Pyysalo, S., et al. (2021). Correction to 'The STRING Database in 2021: Customizable Protein-Protein Networks, and Functional Characterization of User-Uploaded Gene/measurement Sets'. Nucleic Acids Res. 49 (18), 10800. doi:10.1093/nar/gkab835

PubMed Abstract | CrossRef Full Text | Google Scholar

Tabas-Madrid, D., Nogales-Cadenas, R., and Pascual-Montano, A. (2012). GeneCodis3: a Non-redundant and Modular Enrichment Analysis Tool for Functional Genomics. Nucleic Acids Res. 40, W478–W483. doi:10.1093/nar/gks402

PubMed Abstract | CrossRef Full Text | Google Scholar

Takahashi, S., Saegusa, J., Onishi, A., and Morinobu, A. (2019). Biomarkers Identified by Serum Metabolomic Analysis to Predict Biologic Treatment Response in Rheumatoid Arthritis Patients. Rheumatology 58 (12), 2153–2161. doi:10.1093/rheumatology/kez199

PubMed Abstract | CrossRef Full Text | Google Scholar

Thompson, J. F., and Steinmann, K. E. (2010). Single Molecule Sequencing with a HeliScope Genetic Analysis System. Curr. Protoc. Mol. Biol. Chapter 7 (1), Unit7–10. doi:10.1002/0471142727.mb0710s92

PubMed Abstract | CrossRef Full Text | Google Scholar

Thorn, C. F., Klein, T. E., and Altman, R. B. (2013). PharmGKB: the Pharmacogenomics Knowledge Base. Methods Mol. Biol. (Clifton, N.J.) 1015, 311–320. doi:10.1007/978-1-62703-435-7_20

PubMed Abstract | CrossRef Full Text | Google Scholar

Tripathi, P., Somashekar, B. S., Ponnusamy, M., Gursky, A., Dailey, S., Kunju, P., et al. (2013). HR-MAS NMR Tissue Metabolomic Signatures Cross-Validated by Mass Spectrometry Distinguish Bladder Cancer from Benign Disease. J. Proteome Res. 12 (7), 3519–3528. doi:10.1021/pr4004135

PubMed Abstract | CrossRef Full Text | Google Scholar

Trost, B., and Kusalik, A. (2011). Computational Prediction of Eukaryotic Phosphorylation Sites. Bioinformatics 27 (21), 2927–2935. doi:10.1093/bioinformatics/btr525

PubMed Abstract | CrossRef Full Text | Google Scholar

van Dijk, E. L., Jaszczyszyn, Y., Naquin, D., and Thermes, C. (2018). The Third Revolution in Sequencing Technology. Trends Genet. 34 (9), 666–681. doi:10.1016/j.tig.2018.05.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., et al. (2001). The Sequence of the Human Genome. science 291 (5507), 1304–1351. doi:10.1126/science.1058040

PubMed Abstract | CrossRef Full Text | Google Scholar

Verli, H. (2014). Bioinformática: da biologia à flexibilidade molecular.

Google Scholar

Vitak, S. A., Torkenczy, K. A., Rosenkrantz, J. L., Fields, A. J., Christiansen, L., Wong, M. H., et al. (2017). Sequencing Thousands of Single-Cell Genomes with Combinatorial Indexing. Nat. Methods 14 (3), 302–308. doi:10.1038/nmeth.4154

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, J. (2009). Protein Structure Prediction by Comparative Modeling: An Analysis of Methodology.

Google Scholar

Wang, K., Li, M., and Hakonarson, H. (2010). ANNOVAR: Functional Annotation of Genetic Variants from High-Throughput Sequencing Data. Nucleic Acids Res. 38 (16), e164. doi:10.1093/nar/gkq603

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, K., Singh, D., Zeng, Z., Coleman, S. J., Huang, Y., Savich, G. L., et al. (2010). MapSplice: Accurate Mapping of RNA-Seq Reads for Splice junction Discovery. Nucleic Acids Res. 38 (18), e178. doi:10.1093/nar/gkq622

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, X., Xiong, X., Cao, W., Zhang, C., Werren, J. H., and Wang, X. (2019). Genome Assembly of the A-Group Wolbachia in Nasonia oneida Using Linked-Reads Technology. Genome Biol. Evol. 11 (10), 3008–3013. doi:10.1093/gbe/evz223

PubMed Abstract | CrossRef Full Text | Google Scholar

Wattanachai, N., Kaewmoongkun, S., Pussadhamma, B., Makarawate, P., Wongvipaporn, C., Kiatchoosakun, S., et al. (2017). The Impact of Non-genetic and Genetic Factors on a Stable Warfarin Dose in Thai Patients. Eur. J. Clin. Pharmacol. 73 (8), 973–980. doi:10.1007/s00228-017-2265-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Wen, L., and Tang, F. (2018). Boosting the Power of Single-Cell Analysis. Nat. Biotechnol. 36 (5), 408–409. doi:10.1038/nbt.4131

PubMed Abstract | CrossRef Full Text | Google Scholar

Westholm, J. O., Miura, P., Olson, S., Shenker, S., Joseph, B., Sanfilippo, P., et al. (2014). Genome-wide Analysis of drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-dependent Neural Accumulation. Cel Rep. 9 (5), 1966–1980. doi:10.1016/j.celrep.2014.10.062

PubMed Abstract | CrossRef Full Text | Google Scholar

Whiteaker, J. R., Lin, C., Kennedy, J., Hou, L., Trute, M., Sokal, I., et al. (2011). A Targeted Proteomics-Based Pipeline for Verification of Biomarkers in Plasma. Nat. Biotechnol. 29 (7), 625–634. doi:10.1038/nbt.1900

PubMed Abstract | CrossRef Full Text | Google Scholar

Williams, R., and Zipser, D. D. (1989a). A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Comput. 1. doi:10.1162/neco.1989.1.2.270

CrossRef Full Text | Google Scholar

Wishart, D. S., Feunang, Y. D., Guo, A. C., Lo, E. J., Marcu, A., Grant, J. R., et al. (2018). DrugBank 5.0: a Major Update to the DrugBank Database for 2018. Nucleic Acids Res. 46 (D1), D1074–D1082. doi:10.1093/nar/gkx1037

PubMed Abstract | CrossRef Full Text | Google Scholar

Xiao, T., and Zhou, W. (2020). The Third Generation Sequencing: the Advanced Approach to Genetic Diseases. Transl Pediatr. 9 (2), 163–173. doi:10.21037/tp.2020.03.06

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, H., and Wang, K. (2015). Genomic Variant Annotation and Prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 10 (10), 1556–1566. doi:10.1038/nprot.2015.105

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., and Zhang, Y. (2015). The I-TASSER Suite: Protein Structure and Function Prediction. Nat. Methods 12 (1), 7–8. doi:10.1038/nmeth.3213

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, W., Soares, J., Greninger, P., Edelman, E. J., Lightfoot, H., Forbes, S., et al. (2013). Genomics of Drug Sensitivity in Cancer (GDSC): a Resource for Therapeutic Biomarker Discovery in Cancer Cells. Nucleic Acids Res. 41, D955–D961. doi:10.1093/nar/gks1111

PubMed Abstract | CrossRef Full Text | Google Scholar

You, X., and Conrad, T. O. (2016). Acfs: Accurate circRNA Identification and Quantification from RNA-Seq Data. Sci. Rep. 6 (1), 38820–38911. doi:10.1038/srep38820

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, A., Sun, H., and Wang, X. (2014). Urinary Metabolic Profiling of Rat Models Revealed Protective Function of Scoparone against Alcohol Induced Hepatotoxicity. Sci. Rep. 4 (1), 6768–8. doi:10.1038/srep06768

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, A., Sun, H., Yan, G., Wang, P., and Wang, X. (2015). Metabolomics for Biomarker Discovery: Moving to the Clinic. Biomed. Res. Int. 2015, 354671. doi:10.1155/2015/354671

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, A.-h., Sun, H., Han, Y., Yan, G.-l., Yuan, Y., Song, G.-c., et al. (2013). Ultraperformance Liquid Chromatography-Mass Spectrometry Based Comprehensive Metabolomics Combined with Pattern Recognition and Network Analysis Methods for Characterization of Metabolites and Metabolic Pathways from Biological Data Sets. Anal. Chem. 85 (15), 7606–7612. doi:10.1021/ac401793d

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, X.-O., Wang, H.-B., Zhang, Y., Lu, X., Chen, L.-L., and Yang, L. (2014). Complementary Sequence-Mediated Exon Circularization. Cell 159 (1), 134–147. doi:10.1016/j.cell.2014.09.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, X. H.-F., Leslie, C. S., and Chasin, L. A. (2005). Computational Searches for Splicing Signals. Methods 37 (4), 292–305. doi:10.1016/j.ymeth.2005.07.011

PubMed Abstract | CrossRef Full Text | Google Scholar

Zheng, G. X., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W., Wilson, R., et al. (2017). Massively Parallel Digital Transcriptional Profiling of Single Cells. Nat. Commun. 8 (1), 14049–14112. doi:10.1038/ncomms14049

PubMed Abstract | CrossRef Full Text | Google Scholar

Zheng, G. X. Y., Lau, B. T., Schnall-Levin, M., Jarosz, M., Bell, J. M., Hindson, C. M., et al. (2016). Haplotyping Germline and Cancer Genomes with High-Throughput Linked-Read Sequencing. Nat. Biotechnol. 34 (3), 303–311. doi:10.1038/nbt.3432

PubMed Abstract | CrossRef Full Text | Google Scholar

Zheng, Y., Ji, P., Chen, S., Hou, L., and Zhao, F. (2019). Reconstruction of Full-Length Circular RNAs Enables Isoform-Level Quantification. Genome Med. 11 (1), 2–20. doi:10.1186/s13073-019-0614-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, X., Ren, L., Meng, Q., Li, Y., Yu, Y., and Yu, J. (2010). The Next-Generation Sequencing Technology and Application. Protein Cell 1 (6), 520–536. doi:10.1007/s13238-010-0065-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu, J., Mayeda, A., and Krainer, A. R. (2001). Exon Identity Established through Differential Antagonism between Exonic Splicing Silencer-Bound hnRNP A1 and Enhancer-Bound SR Proteins. Mol. Cel. 8 (6), 1351–1361. doi:10.1016/s1097-2765(01)00409-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Zurada, J. (1992). Introduction to Artificial Neural Systems. Wuhan , China: West Publishing Co.

Google Scholar

Keywords: Single nucleotide polymorphisms (SNPs), Human Splice finder (HSF), Next Generation Sequencing (NGS), in silico, bioinformatics

Citation: Bhat GR, Sethi I, Rah B, Kumar R and Afroze D (2022) Innovative in Silico Approaches for Characterization of Genes and Proteins. Front. Genet. 13:865182. doi: 10.3389/fgene.2022.865182

Received: 29 January 2022; Accepted: 11 April 2022;
Published: 18 May 2022.

Edited by:

Prashanth N Suravajhala, Amrita Vishwa Vidyapeetham University, India

Reviewed by:

Christos K. Kontos, National and Kapodistrian University of Athens, Greece
George Potamias, Foundation for Research and Technology Hellas (FORTH), Greece
Indra Mani, University of Delhi, India

Copyright © 2022 Bhat, Sethi, Rah, Kumar and Afroze. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Dil Afroze, YWZyb3plZGlsQGdtYWlsLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.