Skip to main content

CLINICAL TRIAL article

Front. Oncol., 31 July 2023
Sec. Breast Cancer
This article is part of the Research Topic Women in Breast Cancer vol III: 2023 View all 22 articles

Ultrasound-based radiomics model for predicting molecular biomarkers in breast cancer

Rong XuRong Xu1Tao YouTao You1Chen LiuChen Liu2Qing LinQing Lin1Quehui GuoQuehui Guo1Guodong ZhongGuodong Zhong3Leilei Liu*Leilei Liu1*Qiufang Ouyang*Qiufang Ouyang1*
  • 1Department of Ultrasound, The Second Affiliated Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
  • 2Department of Breast, The Second Affiliated Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China
  • 3Department of Pathology, The Second Affiliated Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China

Background: Breast cancer (BC) is the most common cancer in women and is highly heterogeneous. BC can be classified into four molecular subtypes based on the status of estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) and proliferation marker protein Ki-67. However, they can only be obtained by biopsy or surgery, which is invasive. Radiomics can noninvasively predict molecular expression via extracting the image features. Nevertheless, there is a scarcity of data available regarding the prediction of molecular biomarker expression using ultrasound (US) images in BC.

Objectives: To investigate the prediction performance of US radiomics for the assessment of molecular profiling in BC.

Methods: A total of 342 patients with BC who underwent preoperative US examination between January 2013 and December 2021 were retrospectively included. They were confirmed by pathology and molecular subtype analysis of ER, PR, HER2 and Ki-67. The radiomics features were extracted and four molecular models were constructed through support vector machine (SVM). Pearson correlation coefficient heatmaps are employed to analyze the relationship between selected features and their predictive power on molecular expression. The receiver operating characteristic curve was used for the prediction performance of US radiomics in the assessment of molecular profiling.

Results: 359 lesions with 129 ER- and 230 ER+, 163 PR- and 196 PR+, 265 HER2- and 94 HER2+, 114 Ki-67- and 245 Ki-67+ expression were included. 1314 features were extracted from each ultrasound image. And there was a significant difference of some specific radiomics features between the molecule positive and negative groups. Multiple features demonstrated significant association with molecular biomarkers. The area under curves (AUCs) were 0.917, 0.835, 0.771, and 0.896 in the training set, while 0.868, 0.811, 0.722, and 0.706 in the validation set to predict ER, PR, HER2, and Ki-67 expression respectively.

Conclusion: Ultrasound-based radiomics provides a promising method for predicting molecular biomarker expression of ER, PR, HER2, and Ki-67 in BC.

Introduction

Breast cancer (BC) is currently the most prevalent form of cancer and is also the leading cause of cancer-related deaths among women, according to the International Agency for Research on Cancer (1). The four molecular biomarkers, namely estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), and proliferation marker protein Ki-67, garner significant clinical attention in the clinical practice (2). These four molecular biomarkers play a crucial role in diagnosing BC. Based on the expression levels of these four molecular profiles (3), BC is classified into four distinct subtypes: luminal A, luminal B (including luminal B/HER2-negative and luminal B/HER2-positive), HER2-positive, and triple-negative BC (TNBC). In particular, the treatment protocols, prognosis, and metastatic potential of BC can vary significantly among these different molecular subtypes (4). Therefore, accurate prediction of the molecular profiles holds immense significance in guiding appropriate treatment strategies.

Currently, the assessment of molecular subtypes of BC before surgery typically relies on the results of immunohistochemistry (IHC) obtained through needle biopsy (5). However, this biopsy procedure is invasive and time-consuming. Additionally, a single local biopsy specimen may not always capture the complete molecular characteristics of the whole cancer, because of the high heterogeneity of BC (6). The tumor heterogeneity is an independent factor linked to the insufficient response to neoadjuvant chemotherapy (7). As a result, there is an urgent need for an alternative method that can accurately and non-invasively assess the expression of molecular biomarkers in BC.

With the rapid advancements in computer technology, the field of radiomics has emerged as a cutting-edge approach that harnesses high-throughput capabilities and mathematical algorithms to extract a wide range of quantitative features from medical images (8). This innovative technique not only overcomes the subjective limitations inherent in traditional imaging diagnosis but also enables a more comprehensive assessment of the overall characteristics of lesions and the surrounding tissue. Numerous studies have shown the effectiveness of radiomics based on X-ray, magnetic resonance imaging (MRI), ultrasound and positron emission tomography-computed tomography (PET-CT) for the evaluation of malignancy, differentiation of molecular subtype, and response to neoadjuvant therapy in BC (9). Ultrasound has unique advantages for clinical applications due to its real-time capabilities, frequent examination, and large data size. In particular, the US-radiomics model has demonstrated exceptional performance in distinguishing between benign and malignant breast lesions (10). However, despite these advantages, far few studies have investigated the application of ultrasound radiomics for predicting molecular biomarker expression (11). Furthermore, the number of studies exploring the specific radiomics features that hold great importance in predicting the molecular subtype of BC has been relatively limited.

In the present study, we investigated whether ultrasound radiomics features could be adopted as a predictive biomarker for discriminating the molecular biomarker profiling (ER, PR, HER2, and Ki-67). The purpose of this study was to explore the potential of radiomics features, and to provide complementary information to aid in the diagnostic molecular biomarker expression in BC.

Methods

Study design and cohort of the study

This study was approved by the Ethics Committee of the Second Affiliated Hospital of Fujian University of Traditional Chinese Medicine (SPHFJP-T2022007-01), and informed consent was waived due to the retrospective nature of this study. We retrieved 466 consecutive patients with BC who underwent breast US examination and following treatment in our hospital from January 2013 to December 2021. Inclusion criteria were as follows: (1) Breast US was performed before the operation, and patients did not receive neoadjuvant chemotherapy (NAC) or biopsy prior to US examination; (2) Primary BC was confirmed by pathology; (3) Molecular subtype data (ER, PR, Ki-67, and HER2) were complete; (4) The US image quality met the diagnostic requirements. Exclusion criteria were as follows: (1) Patients without US examination; (2) Cases with incomplete pathological data; (3) Patients who had undergone local or systemic treatment such as puncture biopsy, chemotherapy, radiotherapy, ablation, or resection before breast US examination; (4) Cases with poor imaging quality. Finally, a total of 342 patients with invasive BC were included in this study. Among them, 341 were female and 1 was male. Their mean age was 54.5 years (range from 25 to 90 years old). The workflow of this work shown in Figure 1 mainly includes six steps: patient enrollment, ultrasound image acquisition, features extraction, features selection, model construction and model evaluation.

FIGURE 1
www.frontiersin.org

Figure 1 The workflow of this study. HER2, human epidermal growth factor receptor 2; TNBC, triple-negative breast cance; US, ultrasound; ER, estrogen receptor; PR, progesterone receptor; Ki-67, proliferating cell nuclear antigen; ROI, the region of interest; GLCM, gray level co-occurrence matrix; LASSO, least absolute shrinkage and selection operator; SVM, support vector machine; ROC, receiver operating characteristic.

Breast ultrasonography

Breast ultrasound scanning was performed using Philips, GE, or Siemens color Doppler ultrasound equipment. The patients were positioned in a supine or lateral recumbent position with their hands raised to expose both breasts and axillae, allowing for a multi-angle scan to be performed. The lesions were scanned from multiple angles. And the largest section of ultrasound in each lesion was selected for analysis. The ultrasonic characteristics of the lesions were recorded, including their BI-RADS classification, location, size, shape, boundary, internal echo, calcification, posterior echo changes, blood flow, and axillary lymph nodes. The images were stored in DICOM format. The quality control of the images was carried out by two experienced radiologists, namely, Qing Lin and Quehui Guo. Both these experts possess proficiency in image analysis and worked in consensus to ensure the accuracy and reliability of this work.

Pathology analysis

All primary breast lesions of the participants were pathologically confirmed by either biopsy or resection. Their expression levels of ER, PR, HER2, and Ki-67 were determined by IHC or fluorescence in situ hybridization. ER and PR positive is defined as more than 1%. For HER2, a score of 3+ indicated positive; + or no expression is negative; a score of 2+ requires FISH to determine the amplification status (12). The cutoff threshold for the Ki-67 is 20%. If Ki-67 is greater than or equal to 20%, it indicates highly proliferative and defines as positive (13). Based on the expression of ER, PR, HER2, and Ki-67, BC is divided into four molecular subtypes, i.e. luminal A, luminal B (including luminal B/HER2-negative and luminal B/HER2-positive), HER2-positive, and triple-negative.

Segmentation of tumor and extraction of radiomics features

The breast lesion region of interest (ROI) was manually designated on a grayscale ultrasound image by two sonographers. Those sonographers had no prior knowledge of the histopathological results. An open-source imaging platform, ITK-SNAP (http://www.itksnap.org), was utilized. To demonstrate the effectiveness of the ROI selection method, Figure 2 displayed the original ultrasound image and the ROIs for four patients with breast carcinoma, each exhibiting different expression levels of molecular marker profile.

FIGURE 2
www.frontiersin.org

Figure 2 Cases of the original US image and the ROI. (A) The original US image of case 1 with invasive BC, Ki67 (5%+), ER (90%+), PR (60%+), and HER2 (-). (B) The ROI of case 1. (C) The original US image of case 2 with invasive BC, Ki-67 (30%+), ER (95%+), PR (95%+), and HER2 (-). (D) The ROI of case 2. (E) The original US image of case 3 shows a patient with invasive BC with myeloid characteristics, Ki67 (50%+), ER (-), PR (-), and HER2 (+). (F) The ROI of case 3. (G) The original US image of case 4 with invasive BC, Ki-67 (85%+), ER (-), PR (-), and HER2 (-). (H) The ROI of case 4.

The extraction of lesion features was performed using Pyradiomics version 3.0 software. A total of 1314 radiomics features was extracted from each ultrasound image. Among these features, 7 categories of features were extracted: first order features (n = 252), shape features (n = 12), Gray Level Co-occurrence Matrix (GLCM, n = 336), Gray Level Run Length Matrix (GLRLM, n = 224), Gray Level Size Zone Matrix (GLSZM, n = 224), Gray Level Dependence Matrix (GLDM, n = 196), Neighboring Gray Tone Difference Matrix (NGTDM, n = 70).

Features selection

The consistency of the extracted radiomics features was assessed with the inter- and intra-class correlation coefficient (ICC). Forty cases of ultrasound images, comprising 20 positive and 20 negative cases for each of the molecular biomarkers (ER, PR, HER2, and Ki-67), were randomly selected for analysis. To assess the reproducibility of the radiomics features, two experienced sonographers independently performed the ROI segmentation. Additionally, in order to evaluate inter-class reproducibility, sonographer 1 repeated the segmentation process one month after the initial ROI segmentation. Radiomics features with inter- and intra-class correlation coefficients (ICCs) greater than 0.75 were considered to demonstrate good reproducibility and were selected for model construction. Pearson’s coefficients matrix heatmaps were calculated to analyze the relationship between the radiomics features. And the most optimal features were selected.

Construction of the radiomics model

Before proceeding with the modeling process, several data pre-processing steps were undertaken. These steps involved manual elimination of duplicate information, unpacking the multidimensional array into one-dimensional data by column, and filtering out features with zero variance using ANOVA. After standardizing the data, the least absolute shrinkage and selection operator (LASSO) logistic regression algorithm was used to select molecular-related features with non-zero coefficients, and the penalty parameters were tuned by 10-fold cross-validation. The mean and standard deviation of the selected features were calculated for both the negative and positive groups. The t-values and P-values were calculated to determine whether the features differed significantly between the two groups. The selected features were saved as radiomics labels for subsequent model construction.

The data were divided into a training set (70%) and a validation set (30%), with 251 and 108 lesions in the training and validation sets, respectively. Four support vector machine (SVM) models were created using the radiomics labels and the binary targets for ER, PR, HER2, and Ki-67. To optimize the performance of those models, the tree-structured Parzen Estimator (TPE), a hyperparameter optimization algorithm, was used.

Evaluation of the model

To evaluate the diagnostic performance of the model on the training and validation sets, the receiver operating characteristic (ROC) curve was plotted, and the area under the curve (AUC) was calculated. Additionally, a confusion matrix was created to calculate the sensitivity, specificity, accuracy, and F1 score of the model.

Statistical analysis

Python was used for statistical analysis (version 3.8.2). The normality and homogeneity of variance of the numeric data were assessed using the Kolmogorov-Smirnov test and F-test, respectively. The baseline characteristics for numeric variables was evaluated with the t-test, Fisher’s exact test, and MannWhitney U test. The Chi-square test was applied for categorical variables. A two-sided p< 0.05 was considered a significant difference. The statistical analysis packages include Levene, test, StandardScaler, MinMaxScaler, VarianceThreshold, train_test_split, cross_validate, cross_val_score, RepeatedKFold, confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, roc_curve, LassoCV, SVC, and TPE. The Pearson’s coefficient was calculated using origin software.

Results

Clinicopathological characters

A total of 359 lesions were confirmed by pathology, with 326 cases (95.3%) having a single lesion, 15 cases (4.4%) two lesions, and 1 case (0.3%) three lesions. In terms of the histologic types, the most common type was invasive ductal carcinoma, accounting for approximately 70.5% (253 lesions), followed by the carcinoma in situ, accounting for 14.2% (51 lesions) and by the special types of invasive carcinoma, accounting for 13.1% (47 lesions). The clinicopathological characteristics of the patients were presented in Table 1; Supplementary Table 2. The distribution of molecular subtype was as follows: 86 were luminal A (24.0%), 146 were luminal B (40.7%), 63 were HER2+ (17.5%) and 64 were TNBC (17.8%). The baseline characteristics and clinicopathological information of both the training set and test set are summarized in Table 2. There were no significant differences in tumor size, age, gender, menopausal status, clinical staging, tumor types, molecular subtypes between the two groups. As demonstrated in Figure 3, the expression of ER, PR, HER2, and Ki-67 was as follows: 129 lesions were ER-negative and 230 were ER-positive. Similarly, 163 lesions were PR-negative while 196 were PR-positive. HER2 expression was negative in 265 lesions, while positive in 94 lesions. Moreover, Ki-67 expression was negative in 114 lesions, but positive in 245 lesions.

TABLE 1
www.frontiersin.org

Table 1 Characteristics of the molecular biomarkers of patients.

TABLE 2
www.frontiersin.org

Table 2 Baseline characteristics comparison between the training set and test set.

FIGURE 3
www.frontiersin.org

Figure 3 Patients included in this study (*comparison of the number of lesions in the negative and positive groups). 129 lesions were ER-negative and 230 were ER-positive. Similarly, 163 lesions were PR-negative while 196 were PR-positive. HER2 expression was negative in 265 lesions, while 94 lesions showed HER2-positive expression.

Radiomics signature building

The study extracted 1314 features from each ultrasound image, and 1205 features were retained after processing. Supplementary Table 1 shows the number of retained features after each step of feature selection. And the irrelevant features were removed. To select the relevant features, a LASSO logistic regression model was employed, then 39 and 20 signatures with non-zero coefficients were selected with the target of ER (Figures 4A, E) and PR (Figures 4B, F), respectively, in the primary cohort, after standardization. Normalization was applied before LASSO to choose the HER2-targeted signatures. And 14 signatures were selected by the LASSO algorithm (Figures 4C, G). Interestingly, no high-performance features were selected to classify Ki-67 binary data by 20% cutoff points, regardless of whether standardization or normalization was used before LASSO. Therefore, standardization was implemented, and LASSO was conducted on the Ki-67 target using continuous variables, specifically the exact values of the proliferation index, and 16 signatures were chosen (Figures 4D, H). The selected signatures were saved as radiomics labels for subsequent modeling.

FIGURE 4
www.frontiersin.org

Figure 4 Radiomics feature selection using LASSO logistic regression in the primary cohort. Selection of the tuning parameter (λ) in the LASSO model of the ER (A), PR (B), HER2 (C), and Ki-67 (D) via 10-fold cross-validation based on the mean standard error (MSE) of the minimum criteria. The value of Λ give the minimum average binominal deviance was used to select features. LASSO coefficient profiles of the selected radiomics features of the ER model (E), PR model (F), HER2 model (G), and Ki-67 model (H). Dotted vertical lines were drawn at the optimal values using the minimum criteria and the MSE criteria.

Correlation between the radiomics signature and molecular biomarkers

The radiomics heatmap showcases a matrix of correlation coefficients among the features (Figure 5). The Pearson correlation coefficient was computed to evaluate the relationships among these features. The resulting heatmaps represents these associations, with the color red denoting positive correlations and the color blue indicating negative correlations.

FIGURE 5
www.frontiersin.org

Figure 5 Pearson correlation coefficient heatmaps of selected features on predicting molecular expression of ER (A), PR (B), HER2 (C), and Ki-67 (D). Red represents positive correlations and blue indicates negative correlations. ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor 2; Ki-67, proliferating cell nuclear antigen.

To ensure the accuracy of the radiomics analysis, features with high correlation coefficients (r≥0.9) were removed from the initial pool of 1205 radiomics features. Only the features that exhibited a significant inter-group distribution difference were retained for further analysis. As a result, a total of 39 features were identified as essential for predicting ER expression, while 20 features for PR, 14 features for HER2, and 16 features for Ki-67. Notably, significant correlations are observed between the four molecular biomarkers and various radiomics features, including morphological features, grayscale features, texture features, and laws features.

Radiomic features to predict molecular profiles

Table 3 summarized the top five most significant features selected by the LASSO model, along with their corresponding t-values and P-values for the t-test. These values demonstrated a significant difference between the positive and negative groups (P<0.05). As compared to ER-negative cancer, ER-postive tumors had higher values of ShortRunEmphasis (SRE), Complexity, and ShortRunHighGrayLevelEmphasis (SRHGLE), while lower values of Imc1 and SizeZoneNonUniformityNormalized (SZNUN). Alternatively, PR-positive lesions showed higher values of SmallDependenceHighGrayLevelEmphasis (SDHGLE), while lower values of Maximum, SZNUN, BoundingBox5 and Imc1. HER2-postive cancers displayed significantly higher GrayLevelNonUniformityNormalized (GLNUN), SizeZoneNonUniformityNormalized (SZNUN), InverseVariance, ZonePercentage and Imc1, as compared with HER2-negative cancers. Ki-67-postive lesions showed higher BoundingBox5, SmallAreaEmphasis (SAE), while lower Coarseness, ShortRunLowGrayLevelEmphasis (SRLGLE) than Ki-67-negative cancers. Notably, SRE, Imc1, SZNUN, Complexity, Maximum, SDHGLE, BoundingBox5, GLNU, SRLGLE, and SAE were the most frequently selected signatures with significantly high weights (all p<0.005), indicating their importance in distinguishing between the positive and negative groups. They mainly belong to glcm, glrlm, glszm, ngtdm.

TABLE 3
www.frontiersin.org

Table 3 The top five signatures were selected by Lasso and the t-test values.

SVM model construction and validation of the model

Four models for predicting the molecular biomarkers of ER, PR, HER2, and Ki-67 were created using the features selected by LASSO and the parameters optimized by TPE. Subsequently, four ROC curves were plotted to evaluate the diagnostic efficacy of the models. The AUCs for the training and validation cohorts were presented in Figure 6. The diagnostic efficacy of the four ROC curves was ranked that ER model being the most effective, followed by the PR model, HER2 model, and lastly the Ki-67 model.

FIGURE 6
www.frontiersin.org

Figure 6 Comparison of the area under the ROC curves on the training set and validation cohort of ER model (A), PR model (B), HER2 model (C), and Ki-67 model (D).

The performance of the four models namely the ER model, PR model, HER2 model, and Ki-67 model, were evaluated. Those assessment parameters, including sensitivity, specificity, accuracy, and F1 score are presented in Table 4. The ultrasound-based radiomics model displayed the highest discriminatory power for ER, achieving an AUC of 0.917 in the training set and 0.868 in the validation cohort (Figure 6A). For PR, the radiomics model achieved an AUC of 0.835 in the training set and 0.811 in the validation cohort (Figure 6B). The radiomics model generated an AUC of 0.722 (Figure 6C) and 0.706 (Figure 6D) for HER2 and Ki-67 in the validation cohort, respectively, which was slightly lower than those for ER and PR. Those results suggest that all four models are effective in predicting the molecular expression of BC. Notably, the degree of model fitting for ER, PR, and HER2 exhibited remarkable performance, with no significant signs of overfitting. Conversely, overfitting was evident for Ki-67.

TABLE 4
www.frontiersin.org

Table 4 Diagnostic Performances of the SVM model.

Discussion

Molecular subtyping plays a vital role in tailoring treatment approaches to individual patients. However, it requires biopsy or surgery which is invasive, time-consuming, and sometimes prone to inaccurate due to the heterogeneity. In recent studies, radiomics shows good performance for predicting molecular subtypes of BC (14). In our study, we extracted ultrasound radiomics features to build the prediction models for the expression of ER, PR, HER2, and Ki-67 in BC. Our results indicate that the ultrasound-based radiomics models show excellent performance in predicting molecular biomarkers in BC. Additionally, our research identified several critical radiomics features that play a substantial role in distinguishing between positive and negative expressions of molecular biomarkers. These features, namely SRE, Imc1, SZNUN, Complexity, Maximum, SDHGLE, BoundingBox5, GLNU, SRLGLE, and SAE are highly associated with the expression of ER, PR, HER2, and Ki-67. It is noteworthy that, to the best of our knowledge, our study is the first to establish a relationship between ultrasound-based radiomics features and molecular profiles. Our study offers a non-invasive, cost-effective, and time-efficient alternative for BR molecular classification. And the identification of these specific features provides valuable insights for further research and potential development of diagnostic tools.

It is well-known that the aggressiveness of BC is closely related to its heterogeneity (15, 16), which sometimes is challenging to assess fully when using histopathological tissue samples obtained from needle biopsies (17, 18). The accuracy of molecule profiling diagnosis can be impacted by the size and number of samples obtained (19). Radiomics is a powerful tool that enables the non-invasive assessment of whole-tumor heterogeneity by extracting quantitative features based on texture, shape, and intensity (20). These features provide valuable insights into the underlying biological processes of the imaged tissue, including tumor heterogeneity, microenvironmental characteristics, and etc. There is a growing literature that has reported to predict molecular profiling in BC, but mostly based on MRI and X-ray analysis (21, 22). However, there has been a limited number of studies conducted thus far that utilize ultrasound imaging as the primary modality for investigation (23).

Radiomics features are quantitative descriptors that encompass various aspects of a medical image, including intensity, shape, volume, texture, and etc. They are usually difficult to be interpreted and analyzed intuitively. In our study, 7 categories of image features were extracted from the 1314 radiomics features. We have innovatively developed four molecular prediction models based on ultrastructural features. In the ER-positive model, higher values were observed for SRE, Complexity, and SRHGLE, while lower values were found for Imc1 and SZNUN. Similarly, in the PR-positive model, higher values were observed for SDHGLE, while lower values were found for Maximum, SZNUN, BoundingBox5, and Imc1. The HER2-positive model displayed significantly higher values for GLNUN, SZNUN, InverseVariance, ZonePercentage, and Imc1 compared to HER2-negative BC. In the Ki-67-positive model, higher values were observed for BoundingBox5 and SAE, while lower values were found for Coarseness and SRLGLE, compared to Ki-67-negative lesions. The features of SRE, Imc1, SZNUN, Complexity, Maximum, SDHGLE, BoundingBox5, GLNU, SRLGLE, and SAE are heavily weighted (all P<0.005), indicating their pivotal role in discerning the negative or positive expression of ER, PR, HER2, and Ki-67 molecules. SRE can assess the distribution of short runs of similar intensity values within an image, which can characterize the texture of BC. Its higher values mean a greater proportion of short runs of similar intensity values in the image. Imc1 can characterizes the similarity of gray-level intensity values between adjacent pixels, taking into account their relative positions. SZNUN can determine the degree of heterogeneity in the sizes of homogeneous regions within an image. Its higher values indicate greater variability in the sizes of homogeneous regions across the image. Complexity characterizes the heterogeneity and irregularity in the image intensity values. And the higher values indicating greater complexity and heterogeneity in the image. Maximum is to measure the maximum intensity value in the interpolated image. Complexity characterizes the heterogeneity and irregularity in the image intensity values. And the higher values indicating greater complexity and heterogeneity in the image. Maximum is to measure the maximum intensity value in the interpolated image. SDHGLE measures the joint probability of occurrence of small dependence gray level values with high gray-level values. It can characterize the heterogeneity of a tumor. BoundingBox5 characterizes the compactness of ROI in an image, with higher values indicating that the ROI is more compact. GLNU quantifies the degree of variation in gray-level intensity. A higher value of GLNU indicates that the intensity values within the ROI are more widely distributed, suggesting higher degree of heterogeneity. SRLGLE quantifies the small runs of low gray-level values within an image. SAE measures the proportion of small homogeneous areas in the image, with a higher value indicating a greater proportion of small, homogeneous areas. To the best of our knowledge, this is the first study to investigate the correlation between the aforementioned radiomics features and molecular biomarkers. Their heavy weight emphasizes their importance as crucial markers in the assessment of molecular expression.

These features, including GLCM, GLRLM, GLSZM, and NGTDM, mainly belong to the categories of second-order statistics or higher-order statistics. They provide valuable insights into the irregular or heterogeneous texture of tumors that are not discernible to the naked eye. As far as we know, there are very few studies on the correlation between the aforementioned radiomics features and molecular biomarkers. Previous report indicated that higher Ki-67 expression was associated with posterior acoustic enhancement, and P53-positive cancer was associated with an absence of anecho halo, which was different from ours (24). This inconsistency may be due to the different feature extraction methods. The presence of irregular or heterogeneous tumor textures, as indicated by these features, holds significant clinical implications. It suggests the presence of diverse tissue components within the tumor, potentially reflecting variations in cellularity, vascularity, and spatial organization.

The SVM models created based on the LASSO-selected features and PET-optimized parameters can identify molecular indicators effectively. Our results indicate that US-based radiomics models show optimal performance to predict molecular profiling, with the best for ER, and followed by PR. Both of them had an AUC greater than 0.80 in the validation cohort, whereas they showed lower diagnostic efficacy for HER2 and Ki-67, with an AUC slightly higher than 0.70 in the validation cohort. The ER model performed well in the validated cohort with a high specificity of 87.1% and an F1 score of 0.835. Before modeling, the choice of normalization and the setting of LASSO parameters is crucial, as both will affect the quantity and quality of LASSO feature selection. Moreover, the effectiveness of features will greatly influence the model’s validity. The AUCs for predicting molecular subtype we achieved are similar to the AUCs of 0.74–0.97 in the other literature (25, 26).

In recent years, deep learning techniques have been widely employed to investigate the molecular expression of BC (27, 28). Deep learning models have demonstrated superior diagnostic performance compared to traditional machine learning models. However, deep learning models require a relatively larger sample size than traditional machine learning approaches. Additionally, the training process of deep learning models can be likened to a “blind box,” making it challenging to discern which features are utilized in the modeling process and how they are interconnected. In contrast, machine learning models offer interpretability by enabling the analysis of specific features and their corresponding weights throughout the modeling process (29, 30).

Our study has certain limitations that should be acknowledged. Firstly, it is based on a retrospective, single-center design, and the sample size is relatively small. Therefore, caution should be exercised in generalizing the findings to larger populations. To validate and strengthen our results, further investigations using a larger, multi-center cohort are warranted. Another limitation of our study is the utilization of only two-dimensional grayscale data. The inclusion of additional imaging modalities or three-dimensional data could provide a more comprehensive assessment of the molecular profiling in BC. Additionally, research in the series including the prediction of molecular subtypes, clinical decision making or therapy response based on radiomics would enhance the reliability and value of the radiomics analysis. Despite these limitations, our findings hold significant value and contribute to the understanding of the potential of ultrasound radiomics in assessing the molecular characteristics of BC.

Conclusions

Our study provides evidence that some specific radiomics features extracted from ultrasound images can effectively predict molecular expression of ER, PR, HER2, and Ki-67 in BC. The radiomics models based on the selected radiomics features show good performance in non-invasively assessing the molecular subtypes. Our findings provide a promising method in assessing the molecular profile of breast cancer.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, Further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving human participants were reviewed and approved by the Ethics Committee of the Second Affiliated Hospital of Fujian University of Traditional Chinese Medicine (SPHFJP-T2022007-01). The ethics committee waived the requirement of written informed consent for participation. Ethical review and approval was not required for the animal study because There were no animal experiments in this study and no animal ethics is required. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author contributions

RX contributed to concept development, literature searching, and writing original draft. TY contributed to the software using, and language editing. CL contributed to the patient enrollment. QL and QG contributed to the imaging processing, and methodology. GZ contributed to the analysis of pathology. LL contributed to study management, statistical analysis and methodology. QO contributed to the funding acquisition, and study management. All authors contributed to the article and approved the submitted version.

Funding

This work was funded by the National Natural Science Foundation of China (82174469, 81973916), Natural Science Foundation of Fujian Province (2023J01815) and the Fujian Science Association Science and Technology Innovation Think Tank Research Project (2022XKB037).

Acknowledgments

We thank Engineer Qiang Chen from Fujian Juvenile & Children’s Library for his professional programming expertise. We express our gratitude to our colleague, Na Yang, for providing valuable insights and expertise. We also extend special appreciation to Liling Wang for her insightful comments and invaluable suggestions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2023.1216446/full#supplementary-material

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin (2021) 71(3):209–49. doi: 10.3322/caac.21660

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature (2000) 406(6797):747–52. doi: 10.1038/35021093

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Curigliano G, Burstein HJ, Winer EP, Gnant M, Dubsky P, Loibl S, et al. De-escalating and escalating treatments for early-stage BC: the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early BC 2017. Ann Oncol (2017) 28(8):1700–12. doi: 10.1093/annonc/mdx308

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Wu J, Ge L, Jin Y, Wang Y, Hu L, Xu D, et al. Development and validation of an ultrasound-based radiomics nomogram for predicting the luminal from non-luminal type in patients with breast carcinoma. Front Oncol (2022) 12:993466. doi: 10.3389/fonc.2022.993466

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Wu L, Zhao Y, Lin P, Qin H, Liu Y, Wan D, et al. Preoperative ultrasound radiomics analysis for expression of multiple molecular biomarkers in mass type of breast ductal carcinoma in situ. BMC Med Imaging (2021) 21(1):84. doi: 10.1186/s12880-021-00610-7

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Roulot A, Héquet D, Guinebretière JM, Vincent-Salomon A, Lerebours F, Dubot C, et al. Tumoral heterogeneity of BC. Hétérogénéité tumorale des cancers du sein. Ann Biol Clin (Paris) (2016) 74(6):653–60. doi: 10.1684/abc.2016.1192

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Quan MY, Huang YX, Wang CY, Zhang Q, Chang C, Zhou SC. Deep learning radiomics model based on breast ultrasound video to predict HER2 expression status. Front Endocrinol (Lausanne) (2023) 14:1144812. doi: 10.3389/fendo.2023.1144812

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Shi W, Chen Z, Liu H, Miao C, Feng R, Wang G, et al. COL11A1 as an novel biomarker for BC with machine learning and immunohistochemistry validation. Front Immunol (2022) 13:937125. doi: 10.3389/fimmu.2022.937125

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Choudhery S, Gomez-Cardona D, Favazza CP, Hoskin TL, Haddad TC, Goetz MP, et al. MRI radiomics for assessment of molecular subtype, pathological complete response, and residual cancer burden in BC patients treated with neoadjuvant chemotherapy. Acad Radiol (2022) 29(Suppl 1):S145–54. doi: 10.1016/j.acra.2020.10.020

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Lin F, Wang Z, Zhang K, Yang P, Ma H, Shi Y, et al. Contrast-enhanced spectral mammography-based radiomics nomogram for identifying benign and malignant breast lesions of sub-1 cm. Front Oncol (2020) 10:573630. doi: 10.3389/fonc.2020.573630

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Conti A, Duggento A, Indovina I, Guerrisi M, Toschi N. Radiomics in BC classification and prediction. Semin Cancer Biol (2021) 72:238–50. doi: 10.1016/j.semcancer.2020.04.002

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Gradishar WJ, Anderson BO, Abraham J, Aft R, Agnese D, Allison KH, et al. BC, version 3.2020, NCCN clinical practice guidelines in oncology. J Natl Compr Canc Netw (2020) 18(4):452–78. doi: 10.6004/jnccn.2020.0016

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Jiang M, Zhang D, Tang SC, Luo XM, Chuan ZR, Lv WZ, et al. Deep learning with convolutional neural network in the assessment of BC molecular subtypes based on US images: a multicenter retrospective study. Eur Radiol (2021) 31(6):3673–82. doi: 10.1007/s00330-020-07544-8

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Moyya PD, Asaithambi M. Radiomics - quantitative biomarker analysis for BC diagnosis and prediction: a review. Curr Med Imaging (2022) 18(1):3–17. doi: 10.2174/1573405617666210303102526

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Pertschuk LP, Axiotis CA, Feldman JG, Kim YD, Karavattayhayyil SJ, Braithwaite L, et al. Marked intratumoral heterogeneity of the proto-oncogene her-2/neu determined by three different detection systems. Breast J (1999) 5(6):369–74. doi: 10.1046/j.1524-4741.1999.97088.x

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Allison KH, Dintzis SM, Schmidt RA. Frequency of HER2 heterogeneity by fluorescence in situ hybridization according to CAP expert panel recommendations: time for a new look at how to report heterogeneity. Am J Clin Pathol (2011) 136(6):864–71. doi: 10.1309/AJCPXTZSKBRIP07W

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Davis BW, Zava DT, Locher GW, Goldhirsch A, Hartmann WH. Receptor heterogeneity of human BC as measured by multiple intratumoral assays of estrogen and progesterone receptor. Eur J Cancer Clin Oncol (1984) 20(3):375–82. doi: 10.1016/0277-5379(84)90084-1

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Nassar A, Radhakrishnan A, Cabrero IA, Cotsonis GA, Cohen C. Intratumoral heterogeneity of immunohistochemical marker expression in breast carcinoma: a tissue microarray-based study. Appl Immunohistochem Mol Morphol (2010) 18(5):433–41. doi: 10.1097/PAI.0b013e3181dddb20

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Seol H, Lee HJ, Choi Y, Lee HE, Kim YJ, Kim JH, et al. Intratumoral heterogeneity of HER2 gene amplification in BC: its clinicopathological significance. Mod Pathol (2012) 25(7):938–48. doi: 10.1038/modpathol.2012.36

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Wang X, Xie T, Luo J, Zhou Z, Yu X, Guo X. Radiomics predicts the prognosis of patients with locally advanced BC by reflecting the heterogeneity of tumor cells and the tumor microenvironment. Breast Cancer Res (2022) 24(1):20. doi: 10.1186/s13058-022-01516-0

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Niu S, Jiang W, Zhao N, Jiang T, Dong Y, Luo Y, et al. Intra- and peritumoral radiomics on assessment of BC molecular subtypes based on mammography and MRI. J Cancer Res Clin Oncol (2022) 148(1):97–106. doi: 10.1007/s00432-021-03822-0

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Huang Y, Wei L, Hu Y, Shao N, Lin Y, He S, et al. Multi-parametric MRI-based radiomics models for predicting molecular subtype and androgen receptor expression in BC. Front Oncol (2021) 11:706733. doi: 10.3389/fonc.2021.706733

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Gu J, Jiang T. Ultrasound radiomics in personalized breast management: Current status and future prospects. Front Oncol (2022) 12:963612. doi: 10.3389/fonc.2022.963612

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Cui H, Zhang D, Peng F, Kong H, Guo Q, Wu T, et al. Identifying ultrasound features of positive expression of Ki67 and P53 in BC using radiomics. Asia Pac J Clin Oncol (2021) 17(5):e176–84. doi: 10.1111/ajco.13397

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Li JW, Cao YC, Zhao ZJ, Shi ZT, Duan XQ, Chang C, et al. Prediction for pathological and immunohistochemical characteristics of triple-negative invasive breast carcinomas: the performance comparison between quantitative and qualitative sonographic feature analysis. Eur Radiol (2022) 32(3):1590–600. doi: 10.1007/s00330-021-08224-x

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Ferre R, Elst J, Senthilnathan S, Lagree A, Tabbarah S, Lu FI, et al. Machine learning analysis of breast ultrasound to classify triple negative and HER2+ BC subtypes. Breast Dis (2023) 42(1):59–66. doi: 10.3233/BD-220018

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Boulenger A, Luo Y, Zhang C, Zhao C, Gao Y, Xiao M, et al. Deep learning-based system for automatic prediction of triple-negative BC from ultrasound images. Med Biol Eng Comput (2023) 61(2):567–78. doi: 10.1007/s11517-022-02728-4

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Zhang T, Tan T, Han L, Appelman L, Veltman J, Wessels R, et al. Predicting BC types on and beyond molecular level in a multi-modal fashion. NPJ BC (2023) 9(1):16. doi: 10.1038/s41523-023-00517-2

CrossRef Full Text | Google Scholar

29. Zhang X, Li H, Wang C, Cheng W, Zhu Y, Li D, et al. Evaluating the accuracy of BC and molecular subtype diagnosis by ultrasound image deep learning model. Front Oncol (2021) 11:623506. doi: 10.3389/fonc.2021.623506

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol (2020) 9(2):14. doi: 10.1167/tvst.9.2.14

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: radiomics, biomarker, breast cancer, ultrasonography, support vector machine

Citation: Xu R, You T, Liu C, Lin Q, Guo Q, Zhong G, Liu L and Ouyang Q (2023) Ultrasound-based radiomics model for predicting molecular biomarkers in breast cancer. Front. Oncol. 13:1216446. doi: 10.3389/fonc.2023.1216446

Received: 03 May 2023; Accepted: 11 July 2023;
Published: 31 July 2023.

Edited by:

Francesca Bianchi, University of Milan, Italy

Reviewed by:

Chen Li, Free University of Berlin, Germany
Bilgin Kadri Aribas, Bülent Ecevit University, Türkiye

Copyright © 2023 Xu, You, Liu, Lin, Guo, Zhong, Liu and Ouyang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Leilei Liu, YmlsbF9ib3NzQHNpbmEuY29t; Qiufang Ouyang, dG9ycmVudF8xMDBAMTYzLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.