AUTHOR=Schmidt Maria , Hopp Lydia , Arakelyan Arsen , Kirsten Holger , Engel Christoph , Wirkner Kerstin , Krohn Knut , Burkhardt Ralph , Thiery Joachim , Loeffler Markus , Loeffler-Wirth Henry , Binder Hans
TITLE=The Human Blood Transcriptome in a Large Population Cohort and Its Relation to Aging and Health
JOURNAL=Frontiers in Big Data
VOLUME=3
YEAR=2020
URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2020.548873
DOI=10.3389/fdata.2020.548873
ISSN=2624-909X
ABSTRACT=
Background: The blood transcriptome is expected to provide a detailed picture of an organism's physiological state with potential outcomes for applications in medical diagnostics and molecular and epidemiological research. We here present the analysis of blood specimens of 3,388 adult individuals, together with phenotype characteristics such as disease history, medication status, lifestyle factors, and body mass index (BMI). The size and heterogeneity of this data challenges analytics in terms of dimension reduction, knowledge mining, feature extraction, and data integration.
Methods: Self-organizing maps (SOM)-machine learning was applied to study transcriptional states on a population-wide scale. This method permits a detailed description and visualization of the molecular heterogeneity of transcriptomes and of their association with different phenotypic features.
Results: The diversity of transcriptomes is described by personalized SOM-portraits, which specify the samples in terms of modules of co-expressed genes of different functional context. We identified two major blood transcriptome types where type 1 was found more in men, the elderly, and overweight people and it upregulated genes associated with inflammation and increased heme metabolism, while type 2 was predominantly found in women, younger, and normal weight participants and it was associated with activated immune responses, transcriptional, ribosomal, mitochondrial, and telomere-maintenance cell-functions. We find a striking overlap of signatures shared by multiple diseases, aging, and obesity driven by an underlying common pattern, which was associated with the immune response and the increase of inflammatory processes.
Conclusions: Machine learning applications for large and heterogeneous omics data provide a holistic view on the diversity of the human blood transcriptome. It provides a tool for comparative analyses of transcriptional signatures and of associated phenotypes in population studies and medical applications.