Skip to main content

ORIGINAL RESEARCH article

Front. Genet., 22 February 2023
Sec. Computational Genomics
This article is part of the Research Topic Artificial Intelligence-Based Omics Data Analysis View all 5 articles

Imaging genetic association analysis of triple-negative breast cancer based on the integration of prior sample information

Shipeng Ning&#x;Shipeng Ning1Juan Xie&#x;Juan Xie2Jianlan MoJianlan Mo3You PanYou Pan1Rong HuangRong Huang1Qinghua Huang
Qinghua Huang1*Jifeng Feng
Jifeng Feng3*
  • 1Department of Breast Surgery, Guangxi Medical University Cancer Hospital, Nanning, China
  • 2Department of Clinical Laboratory, Guangxi Medical University Cancer Hospital, Nanning, China
  • 3Department of Anesthesiology, Maternal and Child Health Hospital of Guangxi Zhuang Autonomous Region, Nanning, China

Triple-negative breast cancer (TNBC) is one of the more aggressive subtypes of breast cancer. The prognosis of TNBC patients remains low. Therefore, there is still a need to continue identifying novel biomarkers to improve the prognosis and treatment of TNBC patients. Research in recent years has shown that the effective use and integration of information in genomic data and image data will contribute to the prediction and prognosis of diseases. Considering that imaging genetics can deeply study the influence of microscopic genetic variation on disease phenotype, this paper proposes a sample prior information-induced multidimensional combined non-negative matrix factorization (SPID-MDJNMF) algorithm to integrate the Whole-slide image (WSI), mRNAs expression data, and miRNAs expression data. The algorithm effectively fuses high-dimensional data of three modalities through various constraints. In addition, this paper constructs an undirected graph between samples, uses an adjacency matrix to constrain the similarity, and embeds the clinical stage information of patients in the algorithm so that the algorithm can identify the co-expression patterns of samples with different labels. We performed univariate and multivariate Cox regression analysis on the mRNAs and miRNAs in the screened co-expression modules to construct a TNBC-related prognostic model. Finally, we constructed prognostic models for 2-mRNAs (IL12RB2 and CNIH2) and 2-miRNAs (miR-203a-3p and miR-148b-3p), respectively. The prognostic model can predict the survival time of TNBC patients with high accuracy. In conclusion, our proposed SPID-MDJNMF algorithm can efficiently integrate image and genomic data. Furthermore, we evaluated the prognostic value of mRNAs and miRNAs screened by the SPID-MDJNMF algorithm in TNBC, which may provide promising targets for the prognosis of TNBC patients.

1 Introduction

Triple-negative breast cancer (TNBC) is a specific subtype of breast cancer (BC) (Sukumar et al., 2021). Compared with other BC subtypes, TNBC is generally more aggressive, with rapid disease progression and multiple metastatic diseases at an early stage (Lyons, 2019). Therefore, the prognosis of TNBC patients is poor. Currently, the treatment methods for TNBC patients are mostly chemotherapy and surgery, but the recurrence rate after TNBC treatment is high (Hwang et al., 2019). Therefore, it is necessary to mine novel biomarkers related to the prognosis of TNBC patients.

MicroRNAs (miRNAs) can regulate gene expression, cancer development, and metastasis by binding to target messenger RNAs (mRNAs) (Li et al., 2017; Hong et al., 2020). Previous studies have identified many miRNAs involved in BC pathological progressions, such as miR-205, miR-21, and miR-10b. Overexpression of miR-205 can inhibit the metastasis and invasion of tumor cells in BC (Wang et al., 2013). miR-21 and miR-10 b can promote tumor metastasis and cancer cell proliferation in BC by regulating genes such as NOTCH1, TGFBR2, and TGFB1 (Lee and Jiang, 2017). Therefore, identifying key miRNAs and mRNAs in TNBC may provide new therapeutic targets for TNBC patients. In addition to gene-related biomarkers, cancer tissue-related pathological images provide vital information for disease diagnosis and prognosis. Also, integrating mRNA and image data may contribute to more accurate cancer prognosis prediction (Cheng et al., 2017). Sun et al. (2008) Proposed a new method named GPMKL, which effectively predicted the prognosis of BC patients by entirely using the heterogeneous information in genomic data and image data. Wang et al. (2021) Proposed a unified framework named GPDBN to improve the performance of prognosis prediction in BC patients by integrating genomic data and pathological images. The above studies show that fully mining and integrating the information in mRNA expression data and image data can better predict the prognosis of patients.

Imaging genetics is a non-invasive method that correlates genomics and imaging data to discover significant disease-related modules and explain the pathogenesis of the disease. Most of the previous imaging genetics research has been done on Alzheimer’s disease. Due to the characteristics of small samples and high dimensions of imaging genetics, a variety of effective penalty terms are needed to enable the algorithm to perform effective feature selection in high-dimensional data. Lin et al. (2014) Took into account the prior knowledge of the structure within the data. They used structured sparse canonical correlation analysis (SCCA) to correlate SNPs with fMRI signals at the voxel level to identify more risk loci. In order to solve the association analysis research without prior information, Du et al. (2020) Developed a method based on SCCA to fuse the pairwise group LASSO and graph-guided pairwise group LASSO penalty terms. These two penalty terms are in the SCCA model, the structural information in gene and image data can be automatically recovered, respectively. However, SCCA-related algorithms have high algorithm complexity, and it is time-consuming to perform association analysis on high-dimensional data, and there are few studies on cancer imaging genetics. Some scholars extracted features from the tumor contours of CT images of lung cancer patients, compared and analyzed the imaging features with clinical information and gene expression, and found many imaging features with the predictive ability (Aerts et al., 2014).

In recent years, matrix decomposition technology has been widely used in biological multi-omics analysis and has made significant progress. However, few studies have used this type of technology to integrate cancer imaging genetics data and explore the impact of genetic data on imaging phenotypes. Deng et al. proposed a multi-constrained joint non-negative matrix factorization (MCJNMF) algorithm that integrated PET images and DNA methylation data of sarcomas and successfully discovered co-expression modules associated with lung metastasis. Furthermore, they extended the modality to three dimensions by extending the MCJNMF algorithm. They proposed a multidimensional non-negative matrix factorization (MDJNMF) algorithm that integrated pathological images, DNA methylation data, and copy number variation data from sarcoma data. The mechanism of interaction of the three data in sarcoma patients was successfully discovered (Deng et al., 2021).

This paper proposes a sample prior information-driven multidimensional joint non-negative matrix factorization (SPID-MCJNMF) algorithm, which adds a diagnostic information constraint to the basis matrix based on the MDJNMF algorithm. Specifically, we add the clinical stage information of triple-negative breast cancer patients into the algorithm through the Laplace constraint, which is used to make the samples of the same stage closer in the feature space, and the samples of different stages further in the feature space, allow the algorithm to identify expression patterns across samples of different stages. The results show that compared with several other competitive algorithms, the SPID-MCJNMF algorithm has better reconstruction performance and obtains a significant co-expression module with biological significance. Then, to further mine the biomarkers related to the prognosis of TNBC, we performed the prognosis analysis based on the modules screened by the SPID-MCJNMF algorithm and constructed the mRNA and miRNA-related prognosis models, respectively. Prognostic analysis of external datasets further validated the predictive accuracy of the prognostic model. Our study may provide new targets for the treatment and prognosis of TNBC.

2 Method

2.1 Joint non-negative matrix factorization (JNMF)

Non-negative matrix factorization (NMF) is a robust dimensionality reduction algorithm widely used in bioinformatics to ensure the non-negativity of the original data. Joint non-negative matrix factorization (JNMF) is evolved from NMF, which solves the disadvantage that NMF can only decompose single-modal data, and its objective function is shown in formula (1).

mini=1nXIWHIF2
s.t.W>0,Hi>0,i=1,2,3,..n(1)

Among them, n represents the total number of samples. XI represents the feature matrix of different modal data, each row represents a sample, and each column represents a feature of the sample; it is necessary to ensure that the number of rows of Xii=1,2,3, is consistent (sample one correspondence), the number of columns (number of features) can be different. W is the base matrix obtained after splicing multiple Xi and performing non-negative matrix decomposition, and Hi is the multiple coefficient matrices obtained by decomposing. In this paper, XI represents the WSI image, X2 represents miRNA, and X3 represents mRNA. The coefficient matrices obtained after decomposition are H1, H2, and H3, respectively.

2.2 Multi-dimensional joint non-negative matrix factorization (MDJNMF)

The MDJNMF algorithm is proposed by Deng et al. Based on the JNMF algorithm, they added orthogonal constraints to the coefficient matrix to prevent redundant features from affecting the results. In addition, they added the absolute value of the Pearson correlation coefficient between WSI image and genetics as prior information to the algorithm, and its objective function is shown in Eq. 2.

ΓW,H1,H2,H3=minl=13XlWHlF2+αHlHlTIF2|+γ2|Hl12λ1TrH1A1H2Tλ2TrH1A2H3T+γ1|WF2(2)

Where I is the identity matrix, α is used to control the orthogonality of Hl, and λ1λ2γ1 and γ2 are hyperparameters that control the strength of each regularization constraint, respectively.

2.3 Sample prior information driven multiple dimension joint non-negative matrix factorization (SPID-MDJNMF)

To improve the model’s generalization ability and identify markers associated with breast cancer, this subsection introduces a diagnosis-guided penalty term. By treating each sample as a node in an undirected graph, connections between nodes are used to embed clinical information about the patient. In this paper, we embed the clinical stage information of triple-negative breast cancer patients. If any two nodes are selected, and their diagnosis is the same, then there is a connection between them; if the diagnosis is different, there is no connection. Then the adjacency matrix A can be obtained, and its element Aij can be defined as:

Aij=1,ifxiandxjarefromthesamegroup0,Others(3)

Where the group represents the stage of the samples, and if the stage of the ith and jth samples is the same, the ith row and jth column of matrix A takes the value 1. Furthermore, we define the penalty term for diagnosis guidance as shown in Eq. 3.

PW=i=1kAijWiWj(4)

Where Wi and Wj represent the ith and jth columns of the basis matrix W, respectively. k is the dimensionality reduction. Aij is each element of the matrix A∈Rn×n, which is used to embed the disease course information of the sample. Furthermore, we introduce a degree matrix D, which is a pair whose diagonal elements are angle matrix Dii=j=1nAij. Next, this paper further rewrites PW into the following form:

PW=TrWTLW(5)

L represents the Laplace matrix of L. And L=DA, and then the objective function of the SPID-MCJNMF algorithm is obtained:

ΓW,H1,H2,H3=minl=13XlWHlF2+αHlHlTIF2|+γ2|Hl12λ1TrH1A1H2Tλ2TrH1A2H3T+βTrWTLW+γ1|WF2(6)

Where β is a hyperparameter that controls the strength of the constraints of the sample prior information. Let φij and ϕij be Wij0 and HIij0. The Lagrange multiplier L is expressed as:

LW,Hl=Γ+TrΨWT+l=13TrΦ,HlT,Ψ=Φij,Φ=ϕij(7)

Then L takes the partial derivative with respect to W and Hl, and Eq. 8 can be obtained.

LW=l=132XlHlT+2WHlHlT+2γ1W+2βWL+Ψ
LH1=2WTX1+2WTWH1+4αH1H1TH14αH1λ1H2A1Tλ2H3A2T+γ2E1+Φ1
LH2=2WTX2+2WTWH2+4αH2H2TH24αH2λ1H1A1+γ2E2+Φ2
LH3=2WTX3+2WTWH3+4αH3H3TH34αH3λ2H1A2+γ3E3+Φ3(8)

Among them, the elements of E1, E2 and E3 are all 1. Based on the KKT condition, the equations of Wij and Hlij can be obtained:

l=13XlHlTijwij+βWL+l=13WHlHlT+γ1Wijwij=0
2WTX14αH1λ1H2A1Tλ2H3A2Tijhij1+2WTWH1+4αH1H1TH1+γ2E1ijhij1=0
2WTX24αH2λ1H1A1ijhij2+2WTWH2+4αH2H2TH2+γ2E2ijhij2=0
2WTX34αH3λ2H1A2ijhij3+2WTWH3+4αH3H3TH3+γ2E3ijhij3=0(9)

Finally, the update rules for W and Hl can be expressed as Eq. 10.

wijwijX1H1T+X2H2T+X3H3T+βWLijWH1H1T+WH2H2T+WH3H3T+γ1Wij
hij1hij1WTX1+2aH1+λ12H2A1T+λ22H3A2TijWTWH1+2αH1H1TH1+γ22Elij
hij2hij2WTX2+2aH2+λ12H1A1ijWTWH2+2αH2H2TH2+γ22E2ij
hij3hij3WTX3+2aH3+λ22H1A2ijWTWH3+2αH3H3TH3+γ32E3ij(10)

According to the continuously updated W and Hl, make it satisfy the convergence rule, that is, the relative error of reaching the set value or reaching the set number of iterations. We initialize W and Hl through singular value decomposition, which effectively avoids the randomness of the initialization process of W and Hl. Furthermore, for membership confirmation of co-expression modules, this paper is consistent with previous studies (Li et al., 2017; Hong et al., 2020). In addition, we take the reconstruction error as the criterion for the performance of the algorithm, and its expression is as follows.

relative_error=XWHF2(11)

2.4 Functional enrichment analysis

Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were used to explore the biological processes in which key mRNAs are involved. In addition, in order to further explore the pathways in which key miRNAs are involved, this paper is based on miRDB, TargetScan, and miRTarBase three databases for key miRNA target gene prediction, and KEGG enrichment analysis for their target genes. Pathways with p-values less than 0.05 were considered significant (Figure 1).

FIGURE 1
www.frontiersin.org

FIGURE 1. The workflow of this study.

2.5 Screening and validation of prognosis-related genes

In the prognostic survival analysis, only TNBC patients with a survival time greater than 90 days were retained. Univariate Cox regression analysis was used to identify genes (mRNAs or miRNAs) associated with the prognosis of TNBC patients, and mRNAs or miRNAs with a p-value less than 0.05 were reserved as input for multivariate Cox regression analysis. Next, multivariate Cox regression analysis was used to construct mRNA and miRNA-related prognostic models. We then calculated a risk score, which can be used to classify patients in the training cohorts and validation cohorts into high and low-risk groups. The formula for calculating the risk score is as follows:

RiskScore=n=1mcoefn*xn(12)

where coefn represent the Cox regression coefficient; x(n) represent the expressive value of each genes, m represents the number of gene. Finally, overall survival (OS) times were compared for the two subgroups in the test and validation datasets by KM analysis to determine the predictive value of the risk model. Receiver operating characteristic (ROC) curves were used to assess the accuracy of risk models by the R package “timeROC.”

3 Results

3.1 Data source and preprocessing

The data used in this paper are from the TCGA database (https://www.cancer.gov). In this study, the mRNA expression data (mRNA-Seq, 104 cases), miRNA expression data (miRNA-Seq, 102 cases), WSI image data (69 cases), and clinical data (116 cases) of TNBC patients were obtained from the TCGA database. Finally, 69 TNBC samples with mRNA expression data, miRNA expression data, WSI image data, and clinical data were retained. The GEO cohort (https://www.ncbi.nlm.nih.gov/geo/, GSE58812) was used to validate the accuracy of the prognostic model constructed from mRNA expression data. The miRNA data of the TCGA-BRCA cohort (1000 cases) were utilized to validate the accuracy of the prognostic model constructed from the miRNA expression data.

In this study, mRNA expression data were differentially analyzed using the “limma” package, and genes with p-values less than 0.05 and |logFC|>2 were regarded as differentially expressed genes. Finally, 1438 differentially expressed genes were obtained (Figure 2A), and the expression data of 764 mRNAs were reserved for further analysis through gene annotation. Next, we use the “edgeR” package to normalize the miRNA data (count) by CPM and retain miRNAs with a mean CPM greater than 1. Finally, the expression data of 524 miRNAs were obtained for further analysis (Figure 2B).

FIGURE 2
www.frontiersin.org

FIGURE 2. Expressions of the 1438 genes and 524 miRNAs. (A) Heatmap of the 1438 genes between the normal (N, blue) and the tumor tissues (T, red). (B) Heatmap of the 524 miRNAs between the normal (N, blue) and the tumor tissues (T, red).

Genomic data used in this paper are all from the TCGA database (https://www.cancer.gov). WSI images are from 69 patients with sarcoma. Feature extraction for each WSI image consists of three steps, nuclear segmentation, cell-level feature extraction, and aggregating cell-level features into patient-level features (Phoulady et al., 2016). Based on the experience of previous papers (Cheng et al., 2020), we extracted ten different cell-level features from each segmented nucleus: nuclear area (denoted as area), length of nucleus long and short axes, and long and short axis lengths. The ratio (major, minor, and ratio) of the cell’s mean pixel value in the three channels of RGB (rMean, gMean, and bMean) and the mean, maximum and minimum distances to its neighboring nuclei (distMean, distMax, and distMin) (Cheng et al., 2017). The naming convention for each feature includes cell-level and patient-level features, such as area_bin1, area_mean, etc. In particular, area_bin1 represents the percentage of extremely small cores, and area_bin10 represents the percentage of extremely large cores. Finally, 150 WSI image features are selected as the image data input. We list 150 WSI imaging features in detail in the Supplementary Material. In addition, we provide the difference result files for mRNA and miRNA in Figure 2 (TCGA.diff_mRNA.xls and TCGA. diff_miRNA.xls are in the Supplementary Material).

3.2 Hyperparameter selection

The hyperparameter λ1, λ2, β, γ1, γ2 and the number of co-expression modules K involved in this paper. We conducted experiments on the real dataset, selecting the number of co-expression modules and the remaining four hyperparameters. For the selection of k, since the minimum number of samples/features in the training set is 69, according to the parameter selection experience in the literature (Deng et al., 2020), this paper sets the value of k to 7. For other hyperparameters, we use the grid search method to select parameters. Each parameter is selected from the range of [0.001, 0.01. 0.1. 1], and finally, 1024 parameter combinations are obtained. We take the reconstruction error as the selection basis for selecting all parameters. Furthermore, we use a strategy of early stopping, which stops the iteration when the error no longer decreases, to speed up parameter selection. The hyperparameter selection process is shown in Figure 3 below.

FIGURE 3
www.frontiersin.org

FIGURE 3. Relative errors corresponding to different parameter combinations. The horizontal coordinate in the figure represents 1024 parameter combinations, and the vertical coordinate represents the index to measure the performance of the algorithm proposed in Eq. 11 in Section 2.3.

Finally, we selected the 273rd group of parameters corresponding to the smallest relative error, and the relative error of this group of parameters was 0.7799. Among them λ1=0.01, λ2=0.001, β=0.001, γ1=0.001, γ2=0.01.

3.3 Selection of co-expression modules

The meaning of the co-expression module is a low-dimensional representation of all the features of the three types of data obtained for the projection of the three types of data into the low-dimensional space. The features of each module equivalent to three types of data with large coefficients in the same projection direction are deposited into the same co-expression module. In the experiments, we obtained a total of 69 co-expression modules. We use the absolute value of the Pearson correlation coefficient between the original matrix and the reconstructed matrix of the three elements in each module as the screening basis. The following figure shows the Pearson correlation coefficient and Pearson correlation of the three elements in all modules and the mean of the three elements (Figures 4A–D).

FIGURE 4
www.frontiersin.org

FIGURE 4. Comparison of Pearson correlation coefficients of three elements in different modules. (A–C) are the absolute values of the Pearson correlation coefficients between the original and reconstructed matrices of WSI, miRNA, and mRNA for different modules, respectively. (D) is the mean of the absolute value of the Pearson correlation coefficient.

As can be seen from Figure 3, module 4 has the smallest total relative error. Therefore, target gene prediction of miRNAs in module 4 was performed in this paper. Next, we performed target gene prediction for the 71 miRNAs in module 4, and the miRNA-mRNA pairs supported by the three miRNAs databases were reserved for further analysis. We predicted 76 target genes. Subsequently, we performed KEGG enrichment analysis on the target genes of mRNAs and miRNAs in module 4 to explore their enriched biological pathways. The results showed that 76 mRNAs in module 4 were enriched in Neuroactive ligand-receptor interaction (Figure 5A). Meanwhile, the target genes of miRNAs in module 4 were mainly enriched in MAPK signaling pathway, Breast cancer, PI3K-Akt signaling pathway, Axon guidance, mTOR signaling pathway, FoxO signaling pathway, and Neurotrophin signaling pathway (Figure 5B). In addition, we provide a list of the identified target genes in the Supplementary Material (miRNA_target_gene.xls).

FIGURE 5
www.frontiersin.org

FIGURE 5. The functional enrichment analysis of module 4 in TNBC. (A) The enriched item of the 76 mRNAs. (B) The enriched item of the miRNA-related target genes.

3.4 Comparison with other algorithms

To confirm that the proposed algorithm has good reconstruction performance, we compared the proposed SPID-MDNMF algorithm with the previous JNMF algorithm, the MDJNMF algorithm, under the same experimental conditions. Specifically, the value of k for all three algorithms is 7, and the JNMF algorithm does not have any additional hyperparameters that need to be adjusted. Parameter selection of MDJNMF algorithm is λ1=0.01, λ2=0.001, γ1=0.001, and γ2=0.01. Parameter selection of SPID—MDNMF algorithm for λ1=0.01, λ2=0.001, β=0.001, γ1=0.001, and γ2=0.01. The relative error between the original matrix and the reconstructed matrix and the comparison of the Pearson correlation coefficients is shown in Table 1 below.

TABLE 1
www.frontiersin.org

TABLE 1. Comparison of relative errors and correlation coefficients of algorithms.

As can be seen from the above table, the proposed SPID-MDJNMF algorithm obtains a minor relative error.

3.5 Prognostic biomarkers

To further screen for key biomarkers, we performed prognostic survival analysis on 76 mRNAs and 71 miRNAs in module 4 to obtain biomarkers that could predict the prognosis of TNBC patients. Univariate Cox regression analysis was performed on the expression data of 76 mRNAs and the expression data of 71 miRNAs, respectively, to screen mRNAs and miRNAs associated with survival time of TNBC patients, mRNAs, and miRNAs with p-value <0.05 were retained for further analysis. We obtained a total of 3 mRNAs (IL12RB2, CNIH2 and TIMP4; Figure 6A) and 2 miRNAs (hsa-miR-203a-3p and hsa-miR-148b-3p; Figure 6B) associated with the survival time of TNBC patients (Tables 2, 3). Next, we used multivariate Cox regression analysis (Tables 4, 5) to construct 2-mRNAs-related prognostic models (IL12RB2 and CNIH2) and 2-miRNAs-related prognostic models (hsa-miR -203a-3p and hsa-miR-148b-3p). The risk score of the 2-mRNAs-related prognostic model is expressed as: risk score = (IL12RB2 exp.* −0.60498) + (CNIH2 exp.* −0.43137). The risk score of the 2-miRNAs-related prognostic model is expressed as: risk score = (hsa-miR-203a-3p exp.* 0.403829) + (hsa-miR-148b-3p exp.* 0.997387). TNBC patients in the training dataset (mRNA expression data, TCGA-TNBC) and testing dataset (mRNA expression data, GSE58812) were classified as low-risk group and high-risk group based on the median risk score of the 2-mRNAs-related prognostic model of the TCGA-TNBC cohort. TNBC patients in the training dataset (miRNA expression data, TCGA-TNBC) and the testing dataset (miRNA expression data, TCGA-BRCA) were classified according to the median risk score of the 2-miRNAs-related prognostic model of the TCGA-TNBC cohort into the low-risk group and the high-risk group.

FIGURE 6
www.frontiersin.org

FIGURE 6. Forest map of univariate regression analyses. (A) Univariate Cox regression analysis for identification prognosis-associated mRNAs. (B) Univariate Cox regression analysis for identification prognosis-associated miRNAs.

TABLE 2
www.frontiersin.org

TABLE 2. mRNAs associated with TNBC overall survival time were obtained from univariate Cox regression analysis.

TABLE 3
www.frontiersin.org

TABLE 3. miRNAs associated with TNBC overall survival time were obtained from univariate Cox regression analysis.

TABLE 4
www.frontiersin.org

TABLE 4. mRNAs were obtained from multivariate Cox regression analysis.

TABLE 5
www.frontiersin.org

TABLE 5. miRNAs were obtained from multivariate Cox regression analysis.

Subsequently, we performed KM analysis on mRNA and miRNA-related training datasets. The mRNA-related KM curve showed that the OS rate of high-risk patients in the training dataset was significantly lower than that of low-risk patients over 5 years (Figure 7A, p < 0.001). The miRNA-related KM curve showed that the OS rate of high-risk patients in the training dataset was significantly lower than that of low-risk patients over 5 years (Figure 7B, p = 0.003). To verify the predictive accuracy of the prognostic model, we plotted the 1-, 3-, and 5-year ROC curves of TNBC patients in the mRNA as well as miRNA-related training datasets. The mRNA-related ROC curve showed that the 2-mRNAs prognostic model we constructed could predict the 1-year (AUC = 0.849), 3-year (AUC = 0.752), and 5-year (AUC = 0.802) survival rates of TNBC patients with high accuracy (Figure 7C). The miRNA-related ROC curve showed that the 2-miRNAs prognostic model we constructed could predict the 1-year (AUC = 0.746), 3-year (AUC = 0.863), and 5-year (AUC = 0.765) survival rates of TNBC patients with high accuracy (Figure 7D).

FIGURE 7
www.frontiersin.org

FIGURE 7. Construction of prognostic model in the training datasets. (A) KM curves for the OS of patients in the high- and low-risk groups in the 2 mRNAs-related prognostic model. (B) KM curves for the OS of patients in the high- and low-risk groups in the 2 miRNAs-related prognostic model. (C) Time-dependent ROC curve analysis of the 2 mRNAs-related prognostic for predicting 1-, 3-, 5-year OS. (D) Time-dependent ROC curve analysis of the 2 miRNAs-related prognostic for predicting 1-, 3-, 5-year OS. In addition, we used GSE42568 to validate survival in the high - and low-risk groups (see the Supplementary Figure S1 in Supplementary Material).

We performed KM analysis and ROC analysis on the mRNA and miRNA-related test datasets to further verify the predictive accuracy of the constructed prognostic model. The mRNA-related KM curve showed that the OS rate of high-risk patients in the testing dataset was lower than that of low-risk patients over 5 years (Figure 8A, p = 0.078). The miRNA-related KM curve showed that the OS rate of high-risk patients in the test dataset was significantly lower than that of low-risk patients over 5 years (Figure 8B, p = 0.025). The mRNA correlation ROC curve showed that the 2-mRNAs prognostic model we constructed could predict the 1-year (AUC = 0.788), 3-year (AUC = 0.591), and 5-year (AUC = 0.569) survival of TNBC patients in the test set with certain accuracy rate (Figure 8C). The miRNA-related ROC curve showed that the 2-miRNAs prognostic model we constructed could predict the 1-year (AUC = 0.588), 3-year (AUC = 0.591), and 5-year (AUC = 0.573) survival of TNBC patients in the test set with certain accuracy rate (Figure 8D).

FIGURE 8
www.frontiersin.org

FIGURE 8. Validation of the risk model in the testing datasets. (A) KM curves for the OS of patients in the high- and low-risk groups in the 2 mRNAs-related prognostic model (GSE58812). (B) KM curves for the OS of patients in the high- and low-risk groups in the 2 miRNAs-related prognostic model (TCGA-BRCA). (C) Time-dependent ROC curve analysis of the 2 mRNAs-related prognostic for predicting 1-, 3-, 5-year OS (GSE58812). (D) Time-dependent ROC curve analysis of the 2 miRNAs-related prognostic for predicting 1-, 3-, 5-year OS (TCGA-BRCA).

4 Discussion

Efficient integration of pathological images and genomic data has been reported to help predict disease prognosis and identify critical targets. To this end, we propose the SPID-MDJNMF algorithm to integrate the pathological image data, mRNAs expression data, and miRNAs expression data of TNBC to identify essential biomarkers in TNBC. Subsequently, we compared the original and reconstructed matrices’ relative errors and Pearson correlation coefficients between the proposed SPID-MDNMF algorithm and the previous JNMF algorithm, the MDJNMF algorithm. The results show that our proposed SPID-MDNMF algorithm has better reconstruction performance. We obtained module 4 through the proposed SPID-MDNMF algorithm. The genomic data and image data in module 4 were left for further analysis.

Next, enrichment analysis was utilized to explore the biological functions of the genes in module 4 in TNBC. The 76 mRNAs in module 4 were enriched in Neuroactive ligand-receptor interaction, Homologous recombination, and Cell cycle. Meanwhile, the target genes of miRNAs in module 4 were mainly enriched in the MAPK signaling pathway, Breast cancer, PI3K-Akt signaling pathway, Axon guidance, FoxO signaling pathway, and Neurotrophin signaling pathway. Some pathways have been shown to be associated with the pathological progression of triple-negative breast cancer, such as BECN1 knockout hinders tumor growth, migration, and invasion by inhibiting the cell cycle and partially inhibiting the epithelial-mesenchymal transition of human triple-negative breast cancer cells (Wu et al., 2018). Alterations in the homologous recombination (HR) system are typical of breast cancer mutant tumors (Belli et al., 2019). Previous studies have shown that the homologous recombination deficiency score may predict the chemotherapeutic range of response to platinum-based neoadjuvant therapy in triple-negative breast cancer (Telli et al., 2016). The RNA-binding protein QKI can inhibit the progression of breast cancer by regulating the RASA1/MAPK signaling pathway (Cao et al., 2021). PIK3CA mutations can confer resistance to chemotherapy in TNBC by activating the PI3K/AKT/mTOR signaling pathway (Hu et al., 2021). The above results suggest that the genes in module 4 may also play an essential role in the occurrence and progression of TNBC.

Subsequently, to screen for biomarkers associated with the prognosis of TNBC patients, we performed univariate and multivariate Cox regression analysis on the 76 mRNAs and 71 miRNAs in module 4. Finally, we constructed a prognostic gene model based on 2 mRNAs (IL12RB2 and CNIH2) and 2 miRNAs (hsa-miR-203a-3p and hsa-miR-148b-3p). Moreover, the mRNA-related and miRNA-related prognostic models we constructed can predict the overall survival of TNBC patients with high accuracy. Epigenetic changes in IL12RB2 play an essential role in the plastic behavior of T Helper 17 (Th17) Cells (Bending et al., 2011). Treg and Th17 cells can influence breast cancer progression through Treg cell-mediated suppression of effector T cell responses (Benevides et al., 2013). Therefore, IL12RB2 may affect breast cancer development by regulating Th17 cells. CNIH2 is an AMPA receptor-binding protein significantly slows AMPAR inactivation (Herring et al., 2013). A previous study found that AMPA antagonists inhibited the proliferation of breast and lung cancer cells in vitro (Rzeski et al., 2002). Therefore, CNIH2 may play a role in breast cancer progression through interaction with AMPA. A previous study showed that hsa-miR-203a-3p was upregulated in breast cancer tissues compared with adjacent breast tissues and promoted breast cancer development and carcinogenesis (Cai et al., 2018). Xu et al. found that hsa-miR-203a-3p could inhibit breast cancer progression and metastasis by interacting with circTADA2As (Xu et al., 2019). Breast cancer-related in vitro experiments demonstrated that hsa-miR-148b-3p could inhibit tumor cell proliferation and promote breast cancer cell apoptosis by downregulating TRIM59 (Yuan et al., 2019). In breast cancer, miR-148b-3p was found to be associated with disease recurrence and pathological progression by targeting a series of oncogenes (Cimino et al., 2013).

In conclusion, this paper proposes a SPID-MDNMF algorithm that can effectively integrate image data, mRNAs expression data, and miRNAs expression data. Compared with other similar algorithms, the SPID-MDNMF algorithm has better reconstruction performance. Based on module 4 screened by the SPID-MDNMF algorithm, we constructed 2-mRNAs (IL12RB2 and CNIH2) and 2-miRNAs (hsa-miR-203a-3p and hsa-miR, respectively) by performing a prognostic survival analysis on the TCGA-TNBC cohort -148b-3p) prognostic model. The prognostic model can better predict the prognosis of TNBC.

5 Conclusion

In summary, we proposed the SPID-MDJNMF algorithm to integrate the imaging genetics data of TNBC patients and obtain the co-expression patterns of TNBC patients in different stages. For the significant co-expressed modules, a variety of bioinformatics analyses were performed to construct a prognostic model for TNBC patients. Multiple genes with prognostic value obtained from screening may be potential biomarkers for TNBC.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/geo/, GSE58812; TCGA database (https://www.cancer.gov).

Author contributions

QH contributed to the conception of the study; SN and JX performed the experiment; JM and YP contributed significantly to analysis and manuscript preparation; RH performed the data analyses and wrote the manuscript; JF helped perform the analysis with constructive discussions. All authors have read and approved the manuscript.

Funding

The work was supported by National Natural Science Foundation of China grants (82002779, 81760478, 82060024); The Seedling Plan of Maternal and Child Health Hospital of Guangxi Zhuang Autonomous Region (GXWCH-YMJH-2018005); And Guangxi Provincial Natural Science Foundation of China grants (2019GXNSFAA245083).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2023.1090847/full#supplementary-material

References

Aerts, H. J. W. L., Velazquez, E. R., Leijenaar, R. T. H., Parmar, C., Grossmann, P., Carvalho, S., et al. (2014). Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 5, 4006. doi:10.1038/ncomms5006

PubMed Abstract | CrossRef Full Text | Google Scholar

Belli, C., Duso, B. A., Ferraro, E., and Curigliano, G. (2019). Homologous recombination deficiency in triple negative breast cancer. Breast 45, 15–21. doi:10.1016/j.breast.2019.02.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Bending, D., Newland, S., Krejcí, A., Phillips, J. M., Bray, S., and Cooke, A. (2011). Epigenetic changes at Il12rb2 and Tbx21 in relation to plasticity behavior of Th17 cells. J. Immunol. 186 (6), 3373–3382. doi:10.4049/jimmunol.1003216

PubMed Abstract | CrossRef Full Text | Google Scholar

Benevides, L., Cardoso, C. R., Tiezzi, D. G., Marana, H. R., Andrade, J. M., and Silva, J. S. (2013). Enrichment of regulatory T cells in invasive breast tumor correlates with the upregulation of IL-17A expression and invasiveness of the tumor. Eur. J. Immunol. 43 (6), 1518–1528. doi:10.1002/eji.201242951

PubMed Abstract | CrossRef Full Text | Google Scholar

Cai, K. T., Feng, C. X., Zhao, J. C., He, R. Q., Ma, J., and Zhong, J. C. (2018). Upregulated miR-203a-3p and its potential molecular mechanism in breast cancer: A study based on bioinformatics analyses and a comprehensive meta-analysis. Mol. Med. Rep. 18 (6), 4994–5008. doi:10.3892/mmr.2018.9543

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, Y., Chu, C., Li, X., Gu, S., Zou, Q., and Jin, Y. (2021). RNA-binding protein QKI suppresses breast cancer via RASA1/MAPK signaling pathway. Ann. Transl. Med. 9 (2), 104. doi:10.21037/atm-20-4859

PubMed Abstract | CrossRef Full Text | Google Scholar

Cheng, J., Han, Z., Mehra, R., Shao, W., Cheng, M., Feng, Q., et al. (2020). Computational analysis of pathological images enables a better diagnosis of TFE3 Xp11.2 translocation renal cell carcinoma. Nat. Commun. 11, 1778. doi:10.1038/s41467-020-15671-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Cheng, J., Zhang, J., Han, Y., Wang, X., Ye, X., Meng, Y., et al. (2017). Integrative analysis of histopathological images and genomic data predicts clear cell renal cell carcinoma prognosis. Cancer Res. 77 (21), e91–e100. doi:10.1158/0008-5472.CAN-17-0313

PubMed Abstract | CrossRef Full Text | Google Scholar

Cimino, D., De Pittà, C., Orso, F., Zampini, M., Casara, S., Penna, E., et al. (2013). miR148b is a major coordinator of breast cancer progression in a relapse-associated microRNA signature by targeting ITGA5, ROCK1, PIK3CA, NRAS, and CSF1. FASEB J. 27 (3), 1223–1235. doi:10.1096/fj.12-214692

PubMed Abstract | CrossRef Full Text | Google Scholar

Deng, J., Zeng, W., Kong, W., Shi, Y., Mou, X., and Guo, J. (2020). Multi-constrained joint non-negative matrix factorization with application to imaging genomic study of lung metastasis in soft tissue sarcomas. IEEE Trans. bio-medical Eng. 67 (7), 2110–2118. doi:10.1109/TBME.2019.2954989

CrossRef Full Text | Google Scholar

Deng, J., Zeng, W., Luo, S., Kong, W., Shi, Y., Li, Y., et al. (2021). Integrating multiple genomic imaging data for the study of lung metastasis in sarcomas using multi-dimensional constrained joint non-negative matrix factorization. Inf. Sci. 576, 24–36. doi:10.1016/j.ins.2021.06.058

CrossRef Full Text | Google Scholar

Du, L., Liu, K., Yao, X., Risacher, S. L., Han, J., Saykin, A. J., et al. (2020). Detecting genetic associations with brain imaging phenotypes in Alzheimer's disease via a novel structured SCCA approach. Med. image Anal. 61, 101656. doi:10.1016/j.media.2020.101656

PubMed Abstract | CrossRef Full Text | Google Scholar

Herring, B. E., Shi, Y., Suh, Y. H., Zheng, C. Y., Blankenship, S. M., Roche, K. W., et al. (2013). Cornichon proteins determine the subunit composition of synaptic AMPA receptors. Neuron 77 (6), 1083–1096. doi:10.1016/j.neuron.2013.01.017

PubMed Abstract | CrossRef Full Text | Google Scholar

Hong, H. C., Chuang, C. H., Huang, W. C., Weng, S. L., Chen, C. H., Chang, K. H., et al. (2020). A panel of eight microRNAs is a good predictive parameter for triple-negative breast cancer relapse. Theranostics 10 (19), 8771–8789. doi:10.7150/thno.46142

PubMed Abstract | CrossRef Full Text | Google Scholar

Hu, H., Zhu, J., Zhong, Y., Geng, R., Ji, Y., Guan, Q., et al. (2021). PIK3CA mutation confers resistance to chemotherapy in triple-negative breast cancer by inhibiting apoptosis and activating the PI3K/AKT/mTOR signaling pathway. Ann. Transl. Med. 9 (5), 410. doi:10.21037/atm-21-698

PubMed Abstract | CrossRef Full Text | Google Scholar

Hwang, S. Y., Park, S., and Kwon, Y. (2019). Recent therapeutic trends and promising targets in triple negative breast cancer. Pharmacol. Ther. 199, 30–57. doi:10.1016/j.pharmthera.2019.02.006

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, S., and Jiang, X. (2017). Modeling miRNA-mRNA interactions that cause phenotypic abnormality in breast cancer patients. PLoS One 12 (8), e0182666. doi:10.1371/journal.pone.0182666

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Z., Peng, Z., Gu, S., Zheng, J., Feng, D., Qin, Q., et al. (2017). Global analysis of miRNA-mRNA interaction network in breast cancer with brain metastasis. Anticancer Res. 37 (8), 4455–4468. doi:10.21873/anticanres.11841

PubMed Abstract | CrossRef Full Text | Google Scholar

Lin, D., Calhoun, V. D., and Wang, Y. P. (2014). Correspondence between fMRI and SNP data by group sparse canonical correlation analysis. Med. image Anal. 18, 6891–6902. doi:10.1016/j.media.2013.10.010

CrossRef Full Text | Google Scholar

Lyons, T. G. (2019). Targeted therapies for triple-negative breast cancer. Curr. Treat. Options Oncol. 20 (11), 82. doi:10.1007/s11864-019-0682-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Phoulady, H. A., Goldgof, D. B., Hall, L. O., and Mouton, P. R. (2016). “Nucleus segmentation in histology images with hierarchical multilevel thresholding,” in International Society for Optics and Photonics.

Google Scholar

Rzeski, W., Ikonomidou, C., and Turski, L. (2002). Glutamate antagonists limit tumor growth. Biochem. Pharmacol. 64 (8), 1195–1200. PMID: 12234599. doi:10.1016/s0006-2952(02)01218-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Sukumar, J., Gast, K., Quiroga, D., Lustberg, M., and Williams, N. (2021). Triple-negative breast cancer: Promising prognostic biomarkers currently in development. Expert Rev. Anticancer Ther. 21 (2), 135–148. doi:10.1080/14737140.2021.1840984

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, D., Li, A., Tang, B., and Wang, M. (2008). Integrating genomic data and pathological images to effectivepredict breast cancer clinical outcome. Comput. Methods Programs Biomed. 45–53. doi:10.1016/j.cmpb.2018.04.008

CrossRef Full Text | Google Scholar

Telli, M. L., Timms, K. M., Reid, J., Hennessy, B., Mills, G. B., Jensen, K. C., et al. (2016). Homologous recombination deficiency (HRD) score predicts response to platinum-containing neoadjuvant chemotherapy in patients with triple-negative breast cancer. Clin. Cancer Res. 22 (15), 3764–3773. doi:10.1158/1078-0432.CCR-15-2477

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, Z., Li, R., Wang, M., and Li, A. (2021). Gpdbn: Deep bilinear network integrating both genomic data and pathological images for breast cancer prognosis prediction. Bioinformatics 37 (18), 2963–2970. doi:10.1093/bioinformatics/btab185

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, Z., Liao, H., Deng, Z., Yang, P., Du, N., Zhanng, Y., et al. (2013). miRNA-205 affects infiltration and metastasis of breast cancer. Biochem. Biophys. Res. Commun. 441 (1), 139–143. doi:10.1016/j.bbrc.2013.10.025

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, C. L., Zhang, S. M., Lin, L., Gao, S. S., Fu, K. F., Liu, X. D., et al. (2018). BECN1-knockout impairs tumor growth, migration and invasion by suppressing the cell cycle and partially suppressing the epithelial-mesenchymal transition of human triple-negative breast cancer cells. Int. J. Oncol. 53 (3), 1301–1312. doi:10.3892/ijo.2018.4472

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, J. Z., Shao, C. C., Wang, X. J., Zhao, X., Chen, J. Q., Ouyang, Y. X., et al. (2019). circTADA2As suppress breast cancer progression and metastasis via targeting miR-203a-3p/SOCS3 axis. Cell Death Dis. 10 (3), 175. doi:10.1038/s41419-019-1382-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Yuan, L., Liu, Y., Qu, Y., Liu, L., and Li, H. (2019). Exosomes derived from MicroRNA-148b-3p-overexpressing human umbilical cord mesenchymal stem cells restrain breast cancer progression. Front. Oncol. 9, 1076. doi:10.3389/fonc.2019.01076

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: imaging genetics, triple-negative breast cancer, NMF, prior information, immunity, prognosis, biomarkers

Citation: Ning S, Xie J, Mo J, Pan Y, Huang R, Huang Q and Feng J (2023) Imaging genetic association analysis of triple-negative breast cancer based on the integration of prior sample information. Front. Genet. 14:1090847. doi: 10.3389/fgene.2023.1090847

Received: 06 November 2022; Accepted: 10 February 2023;
Published: 22 February 2023.

Edited by:

Andreas Pfenning, Carnegie Mellon University, United States

Reviewed by:

Wenbin Liu, Guangzhou University, China
Furong Huang, Duke University, United States

Copyright © 2023 Ning, Xie, Mo, Pan, Huang, Huang and Feng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jifeng Feng, fjf0771@163.com; Qinghua Huang, huangqinghua@stu.gxmu.edu.cn

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.