PERSPECTIVE article

Front. Genet., 06 January 2012

Sec. Computational Genomics

volume 2 - 2011 | https://doi.org/10.3389/fgene.2011.00105

When One and One Gives More than Two: Challenges and Opportunities of Integrative Omics

  • HC

    Hyungwon Choi 1*

  • NP

    Norman Pavelka 2*

  • 1. Saw Swee Hock School of Public Health, National University of Singapore Singapore

  • 2. Singapore Immunology Network, Agency for Science Technology and Research Singapore

Abstract

Since the dawn of the post-genomic era a myriad of novel high-throughput technologies have been developed that are capable of measuring thousands of biological molecules at once, giving rise to various “omics” platforms. These advances offer the unique opportunity to study how individual parts of a biological system work together to produce emerging phenotypes. Today, many research laboratories are moving toward applying multiple omics platforms to analyze the same biological samples. In addition, network information of interacting molecules is being incorporated more and more into the analysis and interpretation of these multiple omics datasets, which provides novel ways to integrate multiple layers of heterogeneous biological information into a single coherent picture. Here, we provide a perspective on how such recent “integrative omics” efforts are likely going to shift biological paradigms once again, and what challenges lie ahead.

Introduction

The first generation of whole-genome sequencing projects have inspired the development of technologies aimed at comprehensively characterizing various types of biological molecules, opening up entirely new fields such as genomics, transcriptomics, proteomics, metabolomics, and so forth. Thanks to these technological advances, one can now routinely sequence the entire genome of an organism to scan for genetic polymorphisms, measure the abundance of genes and their products, map epigenetic modifications and transcriptional regulations, chart the global networks of genetic interactions or protein–protein interactions (PPI), and comprehensively measure sugars, lipids, and metabolites in virtually any biological specimen. The systems-level information provided by each omics platform offers a unique insight into the complexity of a biological system and, as a consequence, scientific discoveries and their clinical applications have immensely benefited from omics data over the past decade (Van de Vijver et al., 2002; Van ’t Veer et al., 2002; Hanash et al., 2008; Stratton et al., 2009; 1000_Genomes_Project_Consortium, 2010; Hudson et al., 2010; Meyerson et al., 2010; Pang et al., 2010; Solit and Mellinghoff, 2010).

Microarrays were among the first omics platforms to be developed, and already since their first appearance it became clear that microarray data would have to be integrated with other levels of biological information in order to allow researchers to see the “big picture” (Kohane et al., 2002). As experimental protocols evolve with declining costs, scientists are now starting to apply multiple omics platforms to analyze the same biological samples (Ideker et al., 2001; Joyce and Palsson, 2006; Zhang et al., 2010). This type of studies will be critically useful for biologists since they can measure molecular changes at multiple levels simultaneously and get one step closer to understanding how biological systems work as a whole, which is one of the primary goals of “systems biology” (Kitano, 2002; Ge et al., 2003; Fukushima et al., 2009). As such, combining multiple omics, or “integrative omics,” holds a great potential to revolutionize the systems-level analysis of complex biological phenomena and several efforts are already ongoing in various directions.

Given the enormous promise of integrative omics, questions regarding how to design experiments and jointly analyze the heterogeneous data are quickly becoming of interest. Indeed, these new technologies generate an unprecedentedly large amount of data and, ironically, the sheer volume makes it difficult to find a reasonable interpretation of the data. Thus the key to successful application will depend on properly designed experiments, statistically sound data analysis, and appropriate interpretation of the data. In this Perspective, we review both challenges and opportunities encountered by systems biologists, bioinformaticians, and statisticians undertaking the exciting and daunting task of integrating multiple heterogeneous omics datasets.

Opportunities of Integrative Omics

Biological opportunities

Many problems in systems biology can be addressed only by integrating multiple layers of biological information. For example, numerous genetic studies using single-nucleotide polymorphism (SNP) microarrays or high-throughput sequencing often report hundreds of point mutations above the minimal allele frequency as potential disease markers (Carlson et al., 2004; Manolio et al., 2009). However, many of these markers lack the predictive power and fail to reproduce the results across different study populations (Altshuler et al., 2008). This implies that these candidate markers must be further prioritized with additional information such as transcriptional or translational regulation of the gene products affected by the mutations. Accordingly, recent genetics research frequently explores the “genetical genomics” approach (Li and Burmeister, 2005) to integrate population-wide SNP data and transcriptomics data, aiming to identify expression quantitative trait loci (eQTL; Cheung and Spielman, 2009; Cookson et al., 2009; Montgomery et al., 2011). The paired genotype and gene expression data reveals the impact of genetic mutations on transcriptional expression, which is the major mechanism to channel genetic abnormalities into phenotypes. On a similar front, many research articles have reported integration of copy number data and gene expression data to cancer or adaptive evolution studies (Pollack et al., 2002; Chin et al., 2006; Gresham et al., 2008; Rancati et al., 2008). The resulting data explains how various forms of copy number aberration, such as point amplification/deletion, segmental changes, and aneuploidy, induce gene expression changes (Bussey et al., 2006; Stranger et al., 2007). Besides the integration of genomic datasets, advances in tandem mass spectrometry have gradually allowed us to integrate transcriptomics data with quantitative proteomics data (Griffin et al., 2002; Cox et al., 2005; Lu et al., 2007; Fournier et al., 2010; Pavelka et al., 2010), where proteomics data provide direct information to assess the impact of transcriptional changes on the gene products.

So far we reviewed the opportunities when the same genes are profiled at different levels of the primary omics. However, there exists additional network information generated using other high-throughput technologies, where the correlation between interacting molecules can be explicitly modeled. These include various assays for screening PPI (Rual et al., 2005; Gingras et al., 2007; Costanzo et al., 2010), protein–DNA interaction data for mapping transcriptional regulation and epigenetics (Ren et al., 2000; Johnson et al., 2007), post-transcriptional regulation mediated by microRNAs (Bartel, 2009; Hafner et al., 2010), and so forth. Using this information, the association between different molecules, and the lack thereof, can be adjusted for other interacting molecules causally linked across available omics datasets. For instance, transcriptomics and metabolomics data were integrated to identify clusters of genes and metabolites that were coordinately modulated in response to specific nutritional stresses in the model plant Arabidopsis thaliana (Hirai et al., 2004). In addition, transcriptomics data were coupled with PPI networks to determine under which circumstances protein hubs are co-expressed with their respective interacting partners (Taylor et al., 2009) and to use joint expression levels of genes belonging to interaction subnetworks to establish more predictive breast cancer biomarkers (Chuang et al., 2007). Transcriptomics data were also combined with protein–DNA interaction data to infer gene regulatory networks (Lin et al., 2009; Ouyang et al., 2009).

Statistical opportunities

Integrative omics also opens an opportunity for improved statistical analysis. For one, parallel omics datasets can help implement procedures to infer missing data. Many omics platforms are known to be subject to missing observations due to lagging depth, exemplified by the poor coverage of next-generation sequencing (NGS) in repeat-rich regions and the faltering peptide identification of tandem mass spectrometry in low-abundance proteins. Some transcriptomic platforms such as microarrays are also subject to the limitation that only a fixed form of transcripts can be measured while other isoforms present in the sample go undetected. By generating both transcriptomic and proteomic data, however, one can perform statistical inference on the missing observations in one platform using the observations in the other platform since the two data are expected to be correlated within the same biological sample. Recent endeavors to improve peptide sequencing in tandem mass spectrometry (MS/MS) using the parallel transcriptomic data are good examples of this kind (Ramakrishnan et al., 2009; Ning and Nesvizhskii, 2010), but a more sophisticated treatment of missing data using external sources is yet to be developed.

Another important problem in the omics data analysis is the control or estimation of false positives and false negatives, which are incurred when many statistical decisions are to be made simultaneously, i.e., the multiple testing problem. As simultaneous hypothesis testing typically leads to excessively many selections in omics data, currently existing multiple hypothesis testing methods are geared toward controlling the number of false positives, as evidenced by the development of false discovery rate estimation procedures (Benjamini and Hochberg, 1995; Efron et al., 2001). Although these procedures are applicable to the analysis of a single omics platform, the methods are easily generalizable to the multivariate cases for more sophisticated hypothesis testing when the data are available from more than a single omics platform. Suppose that differential expression is tested at the mRNA and protein level simultaneously. Then the hypothesis testing can be performed using bivariate statistics, which is expected to be more powerful than using two independent univariate statistics, since the correlation between the two dataset can be explicitly accounted for. In addition, the added complexity in the joint testing allows differentiation of the genes differentially expressed at both levels versus the genes regulated at either one of the two levels only, providing additional information to infer the underlying regulatory mechanism. Unfortunately, such routines using correlated statistics have rarely been implemented in the integrative omics data analysis so far, but we can envision that as the number of such integrated datasets will increase, so will the level of sophistication in the statistical analysis.

More importantly, the ultimate statistical opportunity in the integrative omics data is the possibility for systems-level probabilistic modeling of multiple data types. In practice, one may well perform a crude form of integrative analysis, i.e., analyze each type of molecular level separately and aggregate the results in a post hoc manner (Figure 1A). This approach, however, fails to capitalize on the power of the correlated data, especially for detecting weak yet consistent signals from multiple data sources (Ideker et al., 2011). Hence one can start using a slightly more sophisticated approach where the data measured at different molecular levels are modeled using multivariate probability models (Figure 1B). As the bivariate example showed above, incorporating data from multiple molecular levels can strengthen the statistical power, since the effects we aim to measure at one molecular level can be adjusted by the data at the other levels. Furthermore, the new threads of network-level information that is becoming increasingly available – such as transcriptional regulatory networks, genetic interaction networks, PPI networks, signal transduction pathways and metabolic networks – allows computational biologists to integrate omics datasets at the level of nodes and edges of biological networks (Figure 1C) and to move beyond the statistical analysis under the assumption of full independence among the different molecules. For instance, versatile statistical techniques such as graphical models can be used in conjunction with the experimentally validated networks, which provides the underlying backbone of the correlation structure. Such models give an efficient probabilistic representation of the complex, systems-wide molecular profiles and considerably improve the statistical power in the analysis.

Figure 1

Challenges of Integrative Omics

Bioinformatics challenges

The first problem bioinformaticians face when asked to integrate, for instance, a transcriptomics dataset and a proteomics dataset is how to map transcript identifiers to protein identifiers. If the one-gene-one-protein hypothesis still holds relatively well in prokaryotes and some lower eukaryotes, the same is certainly not true in higher organisms: genes often encode multiple transcripts by means of alternative splicing (Graveley, 2001) and transcripts can be translated into multiple protein isoforms by means of alternative translation initiation sites (Cavener and Ray, 1991) and post-translational modifications (Mann and Jensen, 2003). A partial solution to this problem is provided by genome-centric databases such as EnsEMBL (Hubbard et al., 2002), protein-centric databases such as UniProt (Apweiler et al., 2004) or more general-purposes web services such as Babelomics (Al-Shahrour et al., 2005), that provide coherent mappings between gene, transcript, and protein identifiers. The challenge becomes even more daunting when one starts to venture outside the central dogma of molecular biology and attempts to integrate a transcriptomics or proteomics dataset with a metabolomic, glycomic, or lipidomic dataset. Here, one could take advantage of the knowledge of metabolic networks to map enzymes involved in the synthesis or chemical conversion of metabolites (e.g., as provided by KEGG, Kanehisa and Goto, 2000, or Reactome, Joshi-Tope et al., 2005) to establish links between the two types of datasets (Antonov et al., 2010). To this end, the systems biology markup language (SBML) represents one of the first and most successful efforts in developing a unified language to represent complex models of interacting biological molecules (Hucka et al., 2003) and has been widely implemented by several software tools. However, only a fraction of the genes in a genome typically encode metabolic enzymes, the rest being structural, regulatory, or signal transduction proteins. Unfortunately, it is not immediately obvious how to close these gaps. It is thus expected that integrative omics data analysis methods will have to deal with the existence of “orphan” molecules that cannot be directly mapped between the two types of datasets.

Another bioinformatic issue is the existence of heterogeneous repositories of primary data sources. Due to the different nature of omics platforms, databases of microarray, NGS, proteomics, or metabolomics experiments have been designed according to different schemes. While it is true that each omics domain has developed its own standards (such as MIAME, Brazma et al., 2001, and MAGE-ML, Spellman et al., 2002, for microarray data, or mzXML, Pedrioli et al., 2004, and HUPO-PSI for proteomics data, Orchard et al., 2003), the lack of well-defined data standards and of standardized nomenclature across different data repositories makes the coherent retrieval and assembly of integrated datasets a non-trivial task. One way to address this issue is the development of so-called “data warehouses,” in which a significant effort is being put in by developers a priori to store and integrate heterogeneous primary databases into a coherent scheme by making use of intermediate abstraction layers between the raw data layer and the user access layer (Rhodes et al., 2004; Chen et al., 2010). An alternative promising approach to data integration in life sciences is offered by Semantic Web technologies (Splendiani et al., 2011). These technologies enable an immediate “connection” between data, which can be easily queried across different databases. At the same time they allow a precise characterization of the “semantics” of the data, i.e., which entities are represented, and which are their relations (Berners-Lee and Hendler, 2001). Such semantic characterization can then provide an integration of information across different databases, which can easily cope with a variety of rapidly evolving data sources and types (Cheung et al., 2005; Smith et al., 2007). How widely this technology will be adopted is likely tied to how well developers of primary omics databases will implement such data representation methods.

Statistical challenges

In addition to the bioinformatics issues, there are important statistical challenges in the integrative omics analysis. As we build more complex models such as multivariate or inter-molecular models, we must revisit some limitations that had plagued the single-source omics data analysis. First, it is likely that the number of biological samples analyzed in a typical integrative study will remain limited, e.g., on the order of a few tens in case–control studies and at most several replicate experiments per comparative condition in the studies using cell lines. To address this limitation, one can utilize efficient statistical methods such as hierarchical models, which are capable of pooling statistical information across different molecular levels (Parmigiani et al., 2002; Sharpf et al., 2009; Ji and Liu, 2010). Second, as we consider modeling the correlations among an increasing number of molecules in the statistical model, the model parameter space will expand in a computationally intractable manner and the limiting sample size will likely lead to over-fitting of models even further. As such, although advanced statistical methods for model selection (e.g., regularization Tibshirani, 1996) may facilitate the choice of predictive models, it must be reminded that there exists a certain trade-off between the gain in power from the added complexity and the loss in specificity due to a poor model fit, where the latter is mainly determined by experimental design issues such as the sample size. Therefore, when complex models are employed, the interaction between model complexity and experimental design factors must be thoroughly evaluated in terms of strengthening sensitivity–specificity profile and reproducibility of results. In sum, it is necessary to find the right balance between complexity and model sparsity to deliver the most reproducible system-wide models from multi-layered omics data.

Conclusion

As it is becoming increasingly clear that integrating multiple omics dataset allows researchers to explore previously uncharted territories describing the functioning of biological systems, more advanced data analysis methods will be required to fully translate this enormous wave of information into biological knowledge. Will the field of bioinformatics and computational biology be able to keep the pace with the exponential development of omics technologies? While it is currently difficult to predict whether this gap will eventually be filled, we argue that if careful statistical considerations are taken into account already at the experimental design phase of a multi-omics project, then there is an opportunity to build rigorous systems-level statistical models that fully take advantage of the interdependent workings of biological molecules. Finally, to foster further advancement of the field, it will be critical to build integrated multi-omics statistical models that are both reusable and easily extendable by other researchers.

Statements

Acknowledgments

Authors are grateful to Andrea Splendiani for inputs on Semantic Web technologies and to Giulia Rancati for critical reading of the manuscript. Hyungwon Choi is supported in part by NUS YLLSOM grant. Norman Pavelka is funded by an A*STAR Investigatorship award.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1

    1000_Genomes_Project_Consortium. (2010). A map of human genome variation from population-scale sequencing. Nature467, 10611073.10.1038/nature09534

  • 2

    Al-ShahrourF.MinguezP.VaquerizasJ. M.CondeL.DopazoJ. (2005). BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res.33, W460W464.10.1093/nar/gki456

  • 3

    AltshulerD.DalyM. J.LanderE. S. (2008). Genetic mapping in human disease. Science322, 881888.10.1126/science.1156409

  • 4

    AntonovA. V.SchmidtE. E.DietmannS.KrestyaninovaM.HermjakobH. (2010). R spider: a network-based analysis of gene lists by combining signaling and metabolic pathways from Reactome and KEGG databases. Nucleic Acids Res.38, W78W83.10.1093/nar/gkq482

  • 5

    ApweilerR.BairochA.WuC. H.BarkerW. C.BoeckmannB.FerroS.GasteigerE.HuangH.LopezR.MagraneM.MartinM. J.NataleD. A.O’donovanC.RedaschiN.YehL. S. (2004). UniProt: the universal protein knowledgebase. Nucleic Acids Res.32, D115D119.10.1093/nar/gnh110

  • 6

    BartelD. P. (2009). MicroRNAs: target recognition and regulatory functions. Cell136, 215233.10.1016/j.cell.2009.01.002

  • 7

    BenjaminiY.HochbergY. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Stat. Methodol.57, 289300.

  • 8

    Berners-LeeT.HendlerJ. (2001). Publishing on the semantic web. Nature410, 10231024.10.1038/35074206

  • 9

    BrazmaA.HingampP.QuackenbushJ.SherlockG.SpellmanP.StoeckertC.AachJ.AnsorgeW.BallC. A.CaustonH. C.GaasterlandT.GlenissonP.HolstegeF. C.KimI. F.MarkowitzV.MateseJ. C.ParkinsonH.RobinsonA.SarkansU.Schulze-KremerS.StewartJ.TaylorR.ViloJ.VingronM. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet.29, 365371.10.1038/ng1201-365

  • 10

    BusseyK. J.ChinK.LababidiS.ReimersM.ReinholdW. C.KuoW. L.GwadryF.Kouros-MehrH.FridlyandJ.JainA.CollinsC.NishizukaS.TononG.RoschkeA.GehlhausK.KirschI.ScudieroD. A.GrayJ. W.WeinsteinJ. N. (2006). Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel. Mol. Cancer Ther.5, 853867.10.1158/1535-7163.MCT-05-0155

  • 11

    CarlsonC. S.EberleM. A.KruglyakL.NickersonD. A. (2004). Mapping complex disease loci in whole-genome association studies. Nature429, 446452.10.1038/nature02623

  • 12

    CavenerD. R.RayS. C. (1991). Eukaryotic start and stop translation sites. Nucleic Acids Res.19, 31853192.10.1093/nar/19.12.3185

  • 13

    ChenC.McgarveyP. B.HuangH.WuC. H. (2010). Protein bioinformatics infrastructure for the integration and analysis of multiple high-throughput “omics” data. Adv. Bioinformatics423589.10.1093/bioinformatics/btq548

  • 14

    CheungK. H.YipK. Y.SmithA.DeknikkerR.MasiarA.GersteinM. (2005). YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics21(Suppl. 1), i85i96.10.1093/bioinformatics/bti1026

  • 15

    CheungV. G.SpielmanR. S. (2009). Genetics of human gene expression: mapping DNA variants that influence gene expression. Nat. Rev. Genet.10, 595604.10.1038/nrg2630

  • 16

    ChinK.DevriesS.FridlyandJ.SpellmanP. T.RoydasguptaR.KuoW. L.LapukA.NeveR. M.QianZ.RyderT.ChenF.FeilerH.TokuyasuT.KingsleyC.DairkeeS.MengZ.ChewK.PinkelD.JainA.LjungB. M.EssermanL.AlbertsonD. G.WaldmanF. M.GrayJ. W. (2006). Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell10, 529541.10.1016/j.ccr.2006.10.009

  • 17

    ChuangH. Y.LeeE.LiuY. T.LeeD.IdekerT. (2007). Network-based classification of breast cancer metastasis. Mol. Syst. Biol.3, 140.10.1038/msb4100180

  • 18

    CooksonW.LiangL.AbecasisG.MoffattM.LathropM. (2009). Mapping complex disease traits with global gene expression. Nat. Rev. Genet.10, 184194.10.1038/nrg2537

  • 19

    CostanzoM.BaryshnikovaA.BellayJ.KimY.SpearE. D.SevierC. S.DingH.KohJ. L.ToufighiK.MostafaviS.PrinzJ.St OngeR. P.VandersluisB.MakhnevychT.VizeacoumarF. J.AlizadehS.BahrS.BrostR. L.ChenY.CokolM.DeshpandeR.LiZ.LinZ. Y.LiangW.MarbackM.PawJ.San LuisB. J.ShuteriqiE.TongA. H.Van DykN.WallaceI. M.WhitneyJ. A.WeirauchM. T.ZhongG.ZhuH.HouryW. A.BrudnoM.RagibizadehS.PappB.PalC.RothF. P.GiaeverG.NislowC.TroyanskayaO. G.BusseyH.BaderG. D.GingrasA. C.MorrisQ. D.KimP. M.KaiserC. A.MyersC. L.AndrewsB. J.BooneC. (2010). The genetic landscape of a cell. Science327, 425431.10.1126/science.1180823

  • 20

    CoxB.KislingerT.EmiliA. (2005). Integrating gene and protein expression data: pattern analysis and profile mining. Methods35, 303314.10.1016/j.ymeth.2004.08.021

  • 21

    EfronB.TibshiraniR.StoreyJ.TusherV. (2001). Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc.96, 11511160.10.1198/016214501753382129

  • 22

    FournierM. L.PaulsonA.PavelkaN.MosleyA. L.GaudenzK.BradfordW. D.GlynnE.LiH.SardiuM. E.FlehartyB.SeidelC.FlorensL.WashburnM. P. (2010). Delayed correlation of mRNA and protein expression in rapamycin-treated cells and a role for Ggc1 in cellular sensitivity to rapamycin. Mol. Cell. Proteomics9, 271284.10.1074/mcp.M900415-MCP200

  • 23

    FukushimaA.KusanoM.RedestigH.AritaM.SaitoK. (2009). Integrated omics approaches in plant systems biology. Curr. Opin. Chem. Biol.13, 532538.10.1016/j.cbpa.2009.09.022

  • 24

    GeH.WalhoutA. J.VidalM. (2003). Integrating “omic” information: a bridge between genomics and systems biology. Trends Genet.19, 551560.10.1016/j.tig.2003.08.009

  • 25

    GingrasA. C.GstaigerM.RaughtB.AebersoldR. (2007). Analysis of protein complexes using mass spectrometry. Nat. Rev. Mol. Cell Biol.8, 645654.10.1038/nrm2208

  • 26

    GraveleyB. R. (2001). Alternative splicing: increasing diversity in the proteomic world. Trends Genet.17, 100107.10.1016/S0168-9525(00)02176-4

  • 27

    GreshamD.DesaiM. M.TuckerC. M.JenqH. T.PaiD. A.WardA.DesevoC. G.BotsteinD.DunhamM. J. (2008). The repertoire and dynamics of evolutionary adaptations to controlled nutrient-limited environments in yeast. PLoS Genet.4, e1000303.10.1371/journal.pgen.1000303

  • 28

    GriffinT. J.GygiS. P.IdekerT.RistB.EngJ.HoodL.AebersoldR. (2002). Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol. Cell. Proteomics1, 323333.10.1074/mcp.M200001-MCP200

  • 29

    HafnerM.LandthalerM.BurgerL.KhorshidM.HausserJ.BerningerP.RothballerA.AscanoM.Jr.JungkampA. C.MunschauerM.UlrichA.WardleG. S.DewellS.ZavolanM.TuschlT. (2010). Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell141, 129141.10.1016/j.cell.2010.03.009

  • 30

    HanashS. M.PitteriS. J.FacaV. M. (2008). Mining the plasma proteome for cancer biomarkers. Nature452, 571579.10.1038/nature06916

  • 31

    HiraiM. Y.YanoM.GoodenoweD. B.KanayaS.KimuraT.AwazuharaM.AritaM.FujiwaraT.SaitoK. (2004). Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. Proc. Natl. Acad. Sci. U.S.A.101, 1020510210.10.1073/pnas.0403218101

  • 32

    HubbardT.BarkerD.BirneyE.CameronG.ChenY.ClarkL.CoxT.CuffJ.CurwenV.DownT.DurbinR.EyrasE.GilbertJ.HammondM.HuminieckiL.KasprzykA.LehvaslaihoH.LijnzaadP.MelsoppC.MonginE.PettettR.PocockM.PotterS.RustA.SchmidtE.SearleS.SlaterG.SmithJ.SpoonerW.StabenauA.StalkerJ.StupkaE.Ureta-VidalA.VastrikI.ClampM. (2002). The Ensembl genome database project. Nucleic Acids Res.30, 3841.10.1093/nar/30.1.38

  • 33

    HuckaM.FinneyA.SauroH. M.BolouriH.DoyleJ. C.KitanoH.ArkinA. P.BornsteinB. J.BrayD.Cornish-BowdenA.CuellarA. A.DronovS.GillesE. D.GinkelM.GorV.GoryaninI. I.HedleyW. J.HodgmanT. C.HofmeyrJ. H.HunterP. J.JutyN. S.KasbergerJ. L.KremlingA.KummerU.Le NovereN.LoewL. M.LucioD.MendesP.MinchE.MjolsnessE. D.NakayamaY.NelsonM. R.NielsenP. F.SakuradaT.SchaffJ. C.ShapiroB. E.ShimizuT. S.SpenceH. D.StellingJ.TakahashiK.TomitaM.WagnerJ.WangJ. (2003). The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics19, 524531.10.1093/bioinformatics/btg015

  • 34

    HudsonT. J.AndersonW.ArtezA.BarkerA. D.BellC.BernabeR. R.BhanM. K.CalvoF.EerolaI.GerhardD. S.GuttmacherA.GuyerM.HemsleyF. M.JenningsJ. L.KerrD.KlattP.KolarP.KusadaJ.LaneD. P.LaplaceF.YouyongL.NettekovenG.OzenbergerB.PetersonJ.RaoT. S.RemacleJ.SchaferA. J.ShibataT.StrattonM. R.VockleyJ. G.WatanabeK.YangH.YuenM. M.KnoppersB. M.BobrowM.Cambon-ThomsenA.DresslerL. G.DykeS. O.JolyY.KatoK.KennedyK. L.NicolasP.ParkerM. J.Rial-SebbagE.Romeo-CasabonaC. M.ShawK. M.WallaceS.WiesnerG. L.ZepsN.LichterP.BiankinA. V.ChabannonC.ChinL.ClementB.De AlavaE.DegosF.FergusonM. L.GearyP.HayesD. N.JohnsA. L.KasprzykA.NakagawaH.PennyR.PirisM. A.SarinR.ScarpaA.Van De VijverM.FutrealP. A.AburataniH.BayesM.BotwellD. D.CampbellP. J.EstivillX.GrimmondS. M.GutI.HirstM.Lopez-OtinC.MajumderP.MarraM.McphersonJ. D.NingZ.PuenteX. S.RuanY.StunnenbergH. G.SwerdlowH.VelculescuV. E.WilsonR. K.XueH. H.YangL.SpellmanP. T.BaderG. D.BoutrosP. C.FlicekP.GetzG.GuigoR.GuoG.HausslerD.HeathS.HubbardT. J.JiangT.JonesS. M.LiQ.López-BigasN.LuoR.MuthuswamyL.OuelletteB. F.PearsonJ. V.PuenteX. S.QuesadaV.RaphaelB. J.SanderC.ShibataT.SpeedT. P.SteinL. D.StuartJ. M.TeagueJ. W.TotokiY.TsunodaT.ValenciaA.WheelerD. A.WuH.ZhaoS.ZhouG.SteinL. D.GuigóR.HubbardT. J.JolyY.JonesS. M.KasprzykA.LathropM.López-BigasN.OuelletteB. F.SpellmanP. T.TeagueJ. W.ThomasG.ValenciaA.YoshidaT.KennedyK. L.AxtonM.DykeS. O.FutrealP. A.GerhardD. S.GunterC.GuyerM.HudsonT. J.McPhersonJ. D.MillerL. J.OzenbergerB.ShawK. M.KasprzykA.SteinL. D.ZhangJ.HaiderS. A.WangJ.YungC. K.CrosA.LiangY.GnaneshanS.GubermanJ.HsuJ.BobrowM.ChalmersD. R.HaselK. W.JolyY.KaanT. S.KennedyK. L.KnoppersB. M.LowranceW. W.MasuiT.NicolásP.Rial-SebbagE.RodriguezL. L.VergelyC.YoshidaT.GrimmondS. M.BiankinA. V.BowtellD. D.CloonanN.DeFazioA.EshlemanJ. R.EtemadmoghadamD.GardinerB. B.KenchJ. G.ScarpaA.SutherlandR. L.TemperoM. A.WaddellN. J.WilsonP. J.McPhersonJ. D.GallingerS.TsaoM. S.ShawP. A.PetersenG. M.MukhopadhyayD.ChinL.DePinhoR. A.ThayerS.MuthuswamyL.ShazandK.BeckT.SamM.TimmsL.BallinV.LuY.JiJ.ZhangX.ChenF.HuX.ZhouG.YangQ.TianG.ZhangL.XingX.LiX.ZhuZ.YuY.YuJ.YangH.LathropM.TostJ.BrennanP.HolcatovaI.ZaridzeD.BrazmaA.EgevardL.ProkhortchoukE.BanksR. E.UhlénM.Cambon-ThomsenA.ViksnaJ.PontenF.SkryabinK.StrattonM. R.FutrealP. A.BirneyE.BorgA.Børresen-DaleA. L.CaldasC.FoekensJ. A.MartinS.Reis-FilhoJ. S.RichardsonA. L.SotiriouC.StunnenbergH. G.ThomsG.van de VijverM.van’t VeerL.CalvoF.BirnbaumD.BlancheH.BoucherP.BoyaultS.ChabannonC.GutI.Masson-JacquemierJ. D.LathropM.PauportéI.PivotX.Vincent-SalomonA.TaboneE.TheilletC.ThomasG.TostJ.TreilleuxI.CalvoF.Bioulac-SageP.ClémentB.DecaensT.DegosF.FrancoD.GutI.GutM.HeathS.LathropM.SamuelD.ThomasG.Zucman-RossiJ.LichterP.EilsR.BrorsB.KorbelJ. O.KorshunovA.LandgrafP.LehrachH.PfisterS.RadlwimmerB.ReifenbergerG.TaylorM. D.von KalleC.MajumderP. P.SarinR.RaoT. S.BhanM. K.ScarpaA.PederzoliP.LawlorR. A.DelledonneM.BardelliA.BiankinA. V.GrimmondS. M.GressT.KlimstraD.ZamboniG.ShibataT.NakamuraY.NakagawaH.KusadaJ.TsunodaT.MiyanoS.AburataniH.KatoK.FujimotoA.YoshidaT.CampoE.López-OtínC.EstivillX.GuigóR.de SanjoséS.PirisM. A.MontserratE.González-DíazM.PuenteX. S.JaresP.ValenciaA.HimmelbauerH.QuesadaV.BeaS.StrattonM. R.FutrealP. A.CampbellP. J.Vincent-SalomonA.RichardsonA. L.Reis-FilhoJ. S.van de VijverM.ThomasG.Masson-JacquemierJ. D.AparicioS.BorgA.Børresen-DaleA. L.CaldasC.FoekensJ. A.StunnenbergH. G.van’t VeerL.EastonD. F.SpellmanP. T.MartinS.BarkerA. D.ChinL.CollinsF. S.ComptonC. C.FergusonM. L.GerhardD. S.GetzG.GunterC.GuttmacherA.GuyerM.HayesD. N.LanderE. S.OzenbergerB.PennyR.PetersonJ.SanderC.ShawK. M.SpeedT. P.SpellmanP. T.VockleyJ. G.WheelerD. A.WilsonR. K.HudsonT. J.ChinL.KnoppersB. M.LanderE. S.LichterP.SteinL. D.StrattonM. R.AndersonW.BarkerA. D.BellC.BobrowM.BurkeW.CollinsF. S.ComptonC. C.DePinhoR. A.EastonD. F.FutrealP. A.GerhardD. S.GreenA. R.GuyerM.HamiltonS. R.HubbardT. J.KallioniemiO. P.KennedyK. L.LeyT. J.LiuE. T.LuY.MajumderP.MarraM.OzenbergerB.PetersonJ.SchaferA. J.SpellmanP. T.StunnenbergH. G.WainwrightB. J.WilsonR. K.YangH. (2010). International network of cancer genome projects. Nature464, 993998.10.1038/nature08987

  • 35

    IdekerT.DutkowskiJ.HoodL. (2011). Boosting signal-to-noise in complex biology: prior knowledge is power. Cell144, 860863.10.1016/j.cell.2011.03.007

  • 36

    IdekerT.ThorssonV.RanishJ. A.ChristmasR.BuhlerJ.EngJ. K.BumgarnerR.GoodlettD. R.AebersoldR.HoodL. (2001). Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science292, 929934.10.1126/science.292.5518.929

  • 37

    JiH.LiuX. S. (2010). Analyzing omics data using hierarchical models. Nat. Biotechnol.28, 337340.10.1038/nbt.1619

  • 38

    JohnsonD. S.MortazaviA.MyersR. M.WoldB. (2007). Genome-wide mapping of in vivo protein-DNA interactions. Science316, 14971502.10.1126/science.1141319

  • 39

    Joshi-TopeG.GillespieM.VastrikI.D’eustachioP.SchmidtE.De BonoB.JassalB.GopinathG. R.WuG. R.MatthewsL.LewisS.BirneyE.SteinL. (2005). Reactome: a knowledgebase of biological pathways. Nucleic Acids Res.33, D428D432.10.1093/nar/gki072

  • 40

    JoyceA. R.PalssonB. O. (2006). The model organism as a system: integrating “omics” data sets. Nat. Rev. Mol. Cell Biol.7, 198210.10.1038/nrm1857

  • 41

    KanehisaM.GotoS. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.28, 2730.10.1093/nar/28.7.e27

  • 42

    KitanoH. (2002). Systems biology: a brief overview. Science295, 16621664.10.1126/science.1069492

  • 43

    KohaneI. S.KhoA.ButteA. J. (2002). Microarrays for an Integrative Genomics. Cambridge: MIT Press.

  • 44

    LiJ.BurmeisterM. (2005). Genetical genomics: combining genetics with gene expression analysis. Hum. Mol. Genet.2, R163R169.10.1093/hmg/ddi267

  • 45

    LinB.WangJ.HongX.YanX.HwangD.ChoJ. H.YiD.UtlegA. G.FangX.SchonesD. E.ZhaoK.OmennG. S.HoodL. (2009). Integrated expression profiling and ChIP-seq analyses of the growth inhibition response program of the androgen receptor. PLoS ONE4, e6589.10.1371/journal.pone.0006589

  • 46

    LuP.VogelC.WangR.YaoX.MarcotteE. M. (2007). Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat. Biotechnol.25, 117124.10.1038/nbt1207-1403

  • 47

    MannM.JensenO. N. (2003). Proteomic analysis of post-translational modifications. Nat. Biotechnol.21, 255261.10.1038/nbt0303-255

  • 48

    ManolioT. A.CollinsF. S.CoxN. J.GoldsteinD. B.HindorffL. A.HunterD. J.MccarthyM. I.RamosE. M.CardonL. R.ChakravartiA.ChoJ. H.GuttmacherA. E.KongA.KruglyakL.MardisE.RotimiC. N.SlatkinM.ValleD.WhittemoreA. S.BoehnkeM.ClarkA. G.EichlerE. E.GibsonG.HainesJ. L.MackayT. F.MccarrollS. A.VisscherP. M. (2009). Finding the missing heritability of complex diseases. Nature461, 747753.10.1038/nature08494

  • 49

    MeyersonM.GabrielS.GetzG. (2010). Advances in understanding cancer genomes through second-generation sequencing. Nat. Rev. Genet.11, 685696.10.1038/nrg2841

  • 50

    MontgomeryS. B.LappalainenT.Gutierrez-ArcelusM.DermitzakisE. T. (2011). Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet.7, e1002144.10.1371/journal.pgen.1002144

  • 51

    NingK.NesvizhskiiA. I. (2010). The utility of mass spectrometry-based proteomic data for validation of novel alternative splice forms reconstructed from RNA-Seq data: a preliminary assessment. BMC Bioinformatics11(Suppl. 11), S14.10.1186/1471-2105-11-S11-S14

  • 52

    OrchardS.HermjakobH.ApweilerR. (2003). The proteomics standards initiative. Proteomics3, 13741376.10.1002/pmic.200300496

  • 53

    OuyangZ.ZhouQ.WongW. H. (2009). ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc. Natl. Acad. Sci. U.S.A.106, 2152121526.10.1073/pnas.0904863106

  • 54

    PangA. W.MacdonaldJ. R.PintoD.WeiJ.RafiqM. A.ConradD. F.ParkH.HurlesM. E.LeeC.VenterJ. C.KirknessE. F.LevyS.FeukL.SchererS. W. (2010). Towards a comprehensive structural variation map of an individual human genome. Genome Biol.11, R52.10.1186/gb-2010-11-5-r52

  • 55

    ParmigianiG.GarrettE. S.AnbazhaghanR.GabrielsonE. (2002). A statistical framework for expression-based molecular classification in cancer. J. R. Stat. Soc. B Stat. Methodol.64, 717736.10.1111/1467-9868.00358

  • 56

    PavelkaN.RancatiG.ZhuJ.BradfordW. D.SarafA.FlorensL.SandersonB. W.HattemG. L.LiR. (2010). Aneuploidy confers quantitative proteome changes and phenotypic variation in budding yeast. Nature468, 321325.10.1038/nature09529

  • 57

    PedrioliP. G.EngJ. K.HubleyR.VogelzangM.DeutschE. W.RaughtB.PrattB.NilssonE.AngelettiR. H.ApweilerR.CheungK.CostelloC. E.HermjakobH.HuangS.JulianR. K.KappE.MccombM. E.OliverS. G.OmennG.PatonN. W.SimpsonR.SmithR.TaylorC. F.ZhuW.AebersoldR. (2004). A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol.22, 14591466.10.1038/nbt1031

  • 58

    PollackJ. R.SorlieT.PerouC. M.ReesC. A.JeffreyS. S.LonningP. E.TibshiraniR.BotsteinD.Borresen-DaleA. L.BrownP. O. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. U.S.A.99, 1296312968.10.1073/pnas.162471999

  • 59

    RamakrishnanS. R.VogelC.PrinceJ. T.LiZ.PenalvaL. O.MyersM.MarcotteE. M.MirankerD. P.WangR. (2009). Integrating shotgun proteomics and mRNA expression data to improve protein identification. Bioinformatics25, 13971403.10.1093/bioinformatics/btp168

  • 60

    RancatiG.PavelkaN.FlehartyB.NollA.TrimbleR.WaltonK.PereraA.Staehling-HamptonK.SeidelC. W.LiR. (2008). Aneuploidy underlies rapid adaptive evolution of yeast cells deprived of a conserved cytokinesis motor. Cell135, 879893.10.1016/j.cell.2008.09.039

  • 61

    RenB.RobertF.WyrickJ. J.AparicioO.JenningsE. G.SimonI.ZeitlingerJ.SchreiberJ.HannettN.KaninE.VolkertT. L.WilsonC. J.BellS. P.YoungR. A. (2000). Genome-wide location and function of DNA binding proteins. Science290, 23062309.10.1126/science.290.5500.2306

  • 62

    RhodesD. R.YuJ.ShankerK.DeshpandeN.VaramballyR.GhoshD.BarretteT.PandeyA.ChinnaiyanA. M. (2004). ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia6, 16.

  • 63

    RualJ. F.VenkatesanK.HaoT.Hirozane-KishikawaT.DricotA.LiN.BerrizG. F.GibbonsF. D.DrezeM.Ayivi-GuedehoussouN.KlitgordN.SimonC.BoxemM.MilsteinS.RosenbergJ.GoldbergD. S.ZhangL. V.WongS. L.FranklinG.LiS.AlbalaJ. S.LimJ.FraughtonC.LlamosasE.CevikS.BexC.LameschP.SikorskiR. S.VandenhauteJ.ZoghbiH. Y.SmolyarA.BosakS.SequerraR.Doucette-StammL.CusickM. E.HillD. E.RothF. P.VidalM. (2005). Towards a proteome-scale map of the human protein-protein interaction network. Nature437, 11731178.10.1038/nature04209

  • 64

    SharpfR. B.TjelmelandH.ParmigianiG.NobelA. B. (2009). A Bayesian model for cross-study differential gene expression. J. Am. Stat. Assoc.104, 12951310.10.1198/jasa.2009.ap07611

  • 65

    SmithA. K.CheungK. H.YipK. Y.SchultzM.GersteinM. K. (2007). LinkHub: a semantic web system that facilitates cross-database queries and information retrieval in proteomics. BMC Bioinformatics8(Suppl. 3), S5.10.1186/1471-2105-8-S3-S5

  • 66

    SolitD. B.MellinghoffI. K. (2010). Tracing cancer networks with phosphoproteomics. Nat. Biotechnol.28, 10281029.10.1038/nbt1010-1028

  • 67

    SpellmanP. T.MillerM.StewartJ.TroupC.SarkansU.ChervitzS.BernhartD.SherlockG.BallC.LepageM.SwiatekM.MarksW. L.GoncalvesJ.MarkelS.IordanD.ShojatalabM.PizarroA.WhiteJ.HubleyR.DeutschE.SengerM.AronowB. J.RobinsonA.BassettD.StoeckertC. J.Jr.BrazmaA. (2002). Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol.3, RESEARCH0046.10.1186/gb-2002-3-9-research0046

  • 68

    SplendianiA.BurgerA.PaschkeA.RomanoP.MarshallM. S. (2011). Biomedical semantics in the semantic web. J. Biomed. Semantics2(Suppl. 1), S1.10.1186/2041-1480-2-1

  • 69

    StrangerB. E.ForrestM. S.DunningM.IngleC. E.BeazleyC.ThorneN.RedonR.BirdC. P.De GrassiA.LeeC.Tyler-SmithC.CarterN.SchererS. W.TavareS.DeloukasP.HurlesM. E.DermitzakisE. T. (2007). Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science315, 848853.10.1126/science.1136678

  • 70

    StrattonM. R.CampbellP. J.FutrealP. A. (2009). The cancer genome. Nature458, 719724.10.1038/nature07943

  • 71

    TaylorI. W.LindingR.Warde-FarleyD.LiuY.PesquitaC.FariaD.BullS.PawsonT.MorrisQ.WranaJ. L. (2009). Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat. Biotechnol.27, 199204.10.1038/nbt.1522

  • 72

    TibshiraniR. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Stat. Methodol.58, 267288.

  • 73

    Van de VijverM. J.HeY. D.Van’t VeerL. J.DaiH.HartA. A.VoskuilD. W.SchreiberG. J.PeterseJ. L.RobertsC.MartonM. J.ParrishM.AtsmaD.WitteveenA.GlasA.DelahayeL.Van Der VeldeT.BartelinkH.RodenhuisS.RutgersE. T.FriendS. H.BernardsR. (2002). A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med.347, 19992009.10.1056/NEJMoa021967

  • 74

    Van ’t VeerL. J.DaiH.Van De VijverM. J.HeY. D.HartA. A.MaoM.PeterseH. L.Van Der KooyK.MartonM. J.WitteveenA. T.SchreiberG. J.KerkhovenR. M.RobertsC.LinsleyP. S.BernardsR.FriendS. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature415, 530536.10.1038/415530a

  • 75

    ZhangW.LiF.NieL. (2010). Integrating multiple “omics” analysis for microbial biology: application and methodologies. Microbiology156, 287301.10.1099/mic.0.034793-0

Summary

Keywords

data integration, omics, systems biology, statistical data analysis

Citation

Choi H and Pavelka N (2012) When One and One Gives More than Two: Challenges and Opportunities of Integrative Omics. Front. Gene. 2:105. doi: 10.3389/fgene.2011.00105

Received

11 October 2011

Accepted

21 December 2011

Published

06 January 2012

Volume

2 - 2011

Edited by

Thiago Motta Venancio, Universidade Estadual do Norte Fluminense, Brazil

Reviewed by

Helder I. Nakaya, Emory University, USA; Robson Francisco De Souza, National Institutes of Health, USA; Fabio Passetti, Instituto Nacional de Câncer, Brazil

Copyright

*Correspondence: Hyungwon Choi, Saw Swee Hock School of Public Health, National University of Singapore, MD3, 16 Medical Drive, Singapore 117597. e-mail: ; Norman Pavelka, Singapore Immunology Network, Agency of Science, Technology and Research, 8A Biomedical Grove, Level 3, Immunos, Singapore 138648. e-mail:

This article was submitted to Frontiers in Bioinformatics and Computational Biology, a specialty of Frontiers in Genetics.

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics