Skip to main content

METHODS article

Front. Genet.
Sec. Statistical Genetics and Methodology
Volume 15 - 2024 | doi: 10.3389/fgene.2024.1489694
This article is part of the Research Topic Statistical Approaches, Applications, and Software for Longitudinal Microbiome Data Analysis and Microbiome Multi-Omics Data Integration View all 5 articles

Structure-Adaptive Canonical Correlation Analysis for Microbiome Multi-omics Data

Provisionally accepted
  • 1 The Chinese University of Hong Kong, Shenzhen, Shenzhen, China
  • 2 School of Statistics, East China Normal University, Shanghai, China
  • 3 Department of Quantitative Health Sciences, Mayo Clinic, Rochester, United States
  • 4 Department of Statistics, College of Science, Texas A&M University College Station, College Station, Texas, United States
  • 5 Department of Statistics, Texas A&M University, College Station, United States

The final, formatted version of the article will be published soon.

    Sparse canonical correlation analysis (sCCA) has been a useful approach for integrating different high-dimensional datasets by finding a subset of correlated features that explain the most correlation in the data. In the context of microbiome studies, investigators are always interested in knowing how the microbiome interacts with the host at different molecular levels such as genome, methylome, transcriptome, metabolome and proteome. sCCA provides a simple approach for exploiting the correlation structure among multiple omics data and finding a set of correlated omics features, which could contribute to understanding the host-microbiome interaction. However, existing sCCA methods do not address compositionality, and its application to microbiome data is thus not optimal. This paper proposes a new sCCA framework for integrating microbiome data with other high-dimensional omics data, accounting for the compositional nature of microbiome sequencing data. It also allows integrating prior structure information such as the grouping structure among bacterial taxa by imposing a "soft" constraint on the coefficients through varying penalization strength. As a result, the method provides significant improvement when the structure is informative while maintaining robustness against a misspecified structure.Through extensive simulation studies and real data analysis, we demonstrate the superiority of the proposed framework over the state-of-the-art approaches.

    Keywords: canonical correlation analysis, compositional effect, phylogenetic tree, Structural information, dimension reduction, variable selection

    Received: 01 Sep 2024; Accepted: 31 Oct 2024.

    Copyright: © 2024 Deng, Tang, CHEN and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    JUN CHEN, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, United States
    Xianyang Zhang, Department of Statistics, College of Science, Texas A&M University College Station, College Station, 77843, Texas, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.