Development of an interpretable machine learning model for Ki-67 prediction in breast cancer using intratumoral and peritumoral ultrasound radiomics features

Wang, Jing; Gao, Weiwei; Lu, Min; Yao, Xiaohua; Yang, Debin

doi:10.3389/fonc.2023.1290313

ORIGINAL RESEARCH article

Front. Oncol. , 17 November 2023

Sec. Breast Cancer

Volume 13 - 2023 | https://doi.org/10.3389/fonc.2023.1290313

Development of an interpretable machine learning model for Ki-67 prediction in breast cancer using intratumoral and peritumoral ultrasound radiomics features

Jing Wang^†

Weiwei Gao^†

Min Lu

Xiaohua Yao

Debin Yang^*‡

Departments of Ultrasound, Jiading District Central Hospital Affiliated Shanghai University of Medicine & Health Sciences, Shanghai, China

Background: Traditional immunohistochemistry assessment of Ki-67 in breast cancer (BC) via core needle biopsy is invasive, inaccurate, and nonrepeatable. While machine learning (ML) provides a promising alternative, its effectiveness depends on extensive data. Although the current mainstream MRI-centered radiomics offers sufficient data, its unsuitability for repeated examinations, along with limited accessibility and an intratumoral focus, constrain the application of predictive models in evaluating Ki-67 levels.

Objective: This study aims to explore ultrasound (US) image-based radiomics, incorporating both intra- and peritumoral features, to develop an interpretable ML model for predicting Ki-67 expression in BC patients.

Methods: A retrospective analysis was conducted on 263 BC patients, divided into training and external validation cohorts. From intratumoral and peritumoral regions of interest (ROIs) in US images, 849 distinctive radiomics features per ROI were derived. These features underwent systematic selection to analyze Ki-67 expression relationships. Four ML models-logistic regression, random forests, support vector machine (SVM), and extreme gradient boosting-were formulated and internally validated to identify the optimal predictive model. External validation was executed to ascertain the robustness of the optimal model, followed by employing Shapley Additive Explanations (SHAP) to reveal the significant features of the model.

Results: Among 231 selected BC patients, 67.5% exhibited high Ki-67 expression, with consistency observed across both training and validation cohorts as well as other clinical characteristics. Of the 1698 radiomics features identified, 15 were significantly correlated with Ki-67 expression. The SVM model, utilizing combined ROI, demonstrated the highest accuracy [area under the receiver operating characteristic curve (AUROC): 0.88], making it the most suitable for predicting Ki-67 expression. External validation sustained an AUROC of 0.82, affirming the model’s robustness above a 40% threshold. SHAP analysis identified five influential features from intra- and peritumoral ROIs, offering insight into individual prediction.

Conclusion: This study emphasized the potential of SVM model using radiomics features from both intra- and peritumoral US images, for predicting elevated Ki-67 levels in BC patients. The model exhibited strong performance in validations, indicating its promise as a noninvasive tool to enable personalized decision-making in BC care.

Introduction

The Ki-67 antigen is a well-established marker in cell proliferation, essential in categorizing luminal subtypes of tumors and predicting therapeutic outcomes in breast cancer (BC) (1, 2). Higher expression levels signify increased aggressiveness, risk of recurrence, and poor prognosis (3). The traditional approach to preoperative assessment of Ki-67 involves immunohistochemistry, requiring tissue samples usually extracted by core needle biopsy (CNB), and subsequent visual analysis by a pathologist (4). However, this primary method is invasive, time-consuming, and nonrepeatable. The inherent heterogeneity of BC results in a concordance rate between CNB and excision specimen, with a substantial variation ranging from 59-88% (5, 6). Additionally, the inability of the traditional approach to dynamically evaluate Ki-67 changes during neoadjuvant therapy highlights its limitations (7). Therefore, a method that is noninvasive and capable of continuous monitoring is urgently needed for the clinical evaluation of Ki-67 status.

With the accelerated advancement of artificial intelligence (AI) techniques, machine learning (ML) has marked remarkable progress in image processing and feature mining, particularly in classification tasks for benchmark images (8). This technology has been transformative; however, its efficacy relies heavily on large sample sizes, making it challenging in medical applications where extensive data extraction from individual patient images is needed (9). Addressing this challenge, radiomics emerged as a solution. This field involves the high-throughput extraction and analysis of vast quantities of quantitative features from digital images, transcending the limitations of human visual perception (10, 11). By identifying correlations between these imaging features and underlying tissue information, radiomics can enhance performance in evaluating the biological characteristics and prognosis of tumors, thus contributing to the optimization of complex clinical decision-making processes (12).

Building upon the significant advancements in the field of radiomics, research has predominantly centered on utilizing magnetic resonance imaging (MRI) to predict Ki-67 levels within BC tissues (13–15). In contrast, ultrasound (US)-based radiomics for the prognostication of Ki-67 remains relatively unexplored. US imaging offers notable advantages over MRI, including wider accessibility, cost-effectiveness, suitability for repeated examinations, superior spatial resolution, real-time availability, and the absence of contraindications for specific patient conditions. In the last few years, a limited number of studies have explored the application of US radiomics in predicting Ki-67 levels. While these investigations represent an encouraging development, they have typically demonstrated a modest predictive efficacy, with the area under the receiver operating characteristic curve (AUROC) usually ranging between 0.7 to 0.8 (16–18). The underlying reason may be the prevalent focus on intratumoral features, neglecting critical biological insights available in the peritumoral area and thus potentially constraining the predictive accuracy of radiomics models. Interactions within the peritumoral area can influence tumor evolution and progression (19), such as inducing cytokine release that fosters an immunosuppressive microenvironment (20). Additionally, peritumoral factors like edema and angiogenesis have been correlated with tumor malignancy (21, 22), indicating that integrating intra- and peritumoral regions in radiomics analysis may enhance predictive capabilities.

In light of these considerations, the present study aims to investigate the potential of US-based radiomics, utilizing both intratumoral and peritumoral regions, in establishing an interpretable ML model for predicting Ki-67 expression in BC patients, thereby contributing to individualized treatment strategies and prognosis assessments.

Materials and methods

This study was conducted in accordance with the ethical guidelines of the Declaration of Helsinki and received approval from the Institutional Review Board of Jiading District Central Hospital Affiliated Shanghai University of Medicine & Health Sciences (2023K29). Due to the retrospective nature, the requirement for informed consent was waived, and all patient data were carefully anonymized.

Patient selection

The study incorporated a comprehensive review of medical records from January 2018 to July 2023, resulting in the identification of 263 female BC patients who met specific criteria. Inclusion in the study required candidates to satisfy the following: 1) a surgical resection-confirmed diagnosis of invasive ductal carcinoma; 2) the presence of a singular and mass-formed breast tumor; 3) Ki-67 status verification through both CNB and excision specimen; 4) US evaluations performed within two weeks before surgery. Additionally, exclusion criteria encompassed: 1) inadequate US imagery or incomplete lesion display; 2) a history of preoperative treatments such as radiotherapy, chemotherapy, or neoadjuvant therapy; 3) an absence of comprehensive clinical details. After rigorous screening, the selected BC patients were divided into training and external validation cohorts in a 7:3 ratio, ensuring the credibility of the predictive model. Clinical and histopathological data, including key aspects such as Ki-67 expression, tumor diameter, and breast imaging-reporting and data system (BI-RADS), were retrieved from medical records. The Ki-67 expression was quantified using the St. Gallen International Expert Consensus guidelines (23). A threshold was set, classifying samples with Ki-67 values of ≥14% as high expression level and those below this value as low expression level.

Image acquisition and segmentation

Bilateral breast US examinations were conducted following the standard scanning protocol with a Samsung RS80A ultrasound system (Samsung Medison, Co. Ltd., South Korea), employing an L3-12A linear array probe. Both longitudinal and transverse sections were captured and saved in the Digital Imaging and Communications in Medicine (DICOM) format for subsequent evaluation. For the region of interest (ROI) segmentation in the radiomics analysis, two senior sonographers, each with over 15 years of expertise in BC ultrasonography, were assigned to use the system’s built-in S-Detect mammary gland mode to automatically recognize the tumor boundaries, ensuring segmentation reliability. The image displaying the lesion at its maximum diameter was selected as the input. Upon centering the lesion, the system autonomously delineated the lesion’s boundary, defining it as the ROI. If the automatically drawn border failed to align with the solid edge of the mass, the operator made manual adjustments to achieve the correct contour. After finalizing the most accurate boundary, the radiomics intratumoral ROI was segmented through the use of the open-source imaging platform 3D Slicer software (v5.0.2). In assessing the peritumoral areas, the intratumoral ROI was expanded radially by 3 mm from the tumor boundary to form a dilated ROI, with segments extending beyond the skin excised. This methodology was adopted in accordance with the insights from Ding et al. (24) regarding the optimal sizing of the peritumoral regions in radiomics analysis. The intratumoral ROI was then subtracted from the dilated ROI to derive the peritumoral ROI. Consequently, three distinct ROIs (intratumoral, peritumoral, and combined ROI) were identified for each patient, as illustrated in Figure 1.

FIGURE 1

Figure 1 Illustration of three ROIs in BC ultrasound imaging: intratumoral, defined via S-Detect mode; peritumoral, derived from the intratumoral ROI by 3mm radial expansion and subtraction; and combined, an integration of both intratumoral and peritumoral ROIs.

Radiomics feature extraction and selection

Subsequent to the precise segmentation of intra- and peritumoral ROIs, the radiomics features were systematically extracted utilizing the 3D Slicer radiomics extension. For both intra- and peritumoral ROIs, a total of 849 distinctive features were extracted for each modality. The original features were categorized into three main groups: 12 shape-based attributes, 18 first-order statistics, and 75 texture features. The texture features were further delineated into five specific matrices, comprising 24 gray-level co-occurrence matrix (GLCM) features, 14 gray-level dependence matrix (GLDM) features, 16 gray-level run length matrix (GLRLM) features, 16 gray-level size zone matrix (GLSZM) features, and 5 neighbouring grey tone difference matrix (NGTDM) features. Additionally, 744 filtered features were derived using wavelet transformations applied to the original first-order and texture attributes, enhancing the depth and complexity of the feature set.

Prior to feature selection, a crucial data preprocessing step was conducted to standardize features using Z-score normalization, aligning them to a mean of zero and a standard deviation of one. The predictive feature selection was carried out through a systematic three-step approach. Initially, interobserver reproducibility for each feature was evaluated by employing intraclass correlation coefficient (ICC) analysis, and a threshold of 0.85 was established for acceptable agreement, minimizing delineation discrepancies between the sonographers. Subsequently, the Student’s t-test was utilized to retain features manifesting false discovery rate-corrected P values below 0.05, identifying them as potential predictors. Lastly, the feature selection was further refined through the application of the least absolute shrinkage and selection operator (LASSO) logistic regression, focusing on the variables that were most representative of Ki-67 expression relationships.

Development and internal verification of ML models

In the pursuit of predicting high Ki-67 expression, the study engaged in the creation of distinct ML models founded on intratumoral ROI, peritumoral ROI, and a synergistic combination of both. Utilizing logistic regression (Logit), random forests (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost), the study applied a triply-repeated five-fold cross-validation strategy to each data subset during the model training phase. This method ensured a rigorous allocation of the data into designated training and testing segments, facilitating optimal model construction. Following the formulation of these models, an internal verification process was conducted to assess the models’ discrimination, calibration, and clinical applicability. The selection of the optimal predictive model was determined by its superior discriminative performance, robust calibration, and alignment with clinical utility.

External verification and interpretability of the optimal model

The complete evaluation of the selected model commenced with external verification, focusing on the discriminative function, calibration, and applicability in an independent sample. This was followed by an interpretative analysis using the SHAP (shapley additive explanation) methodology to dissect the contributions of individual variables to the prediction (25). Stemming from cooperative game theory, SHAP facilitates the quantification of each feature’s individual impact on the model’s prediction by computing the average marginal contribution, thereby addressing the inherent ‘black box’ nature of ML models (26). By analyzing the significance of each feature and ranking them according to their respective SHAP values in descending order, the study identified key predictors, thereby enhancing the comprehension of the intricate relationships that influence Ki-67 expression within the examined patient cohort.

Statistical analysis

A comprehensive statistical approach was adopted in line with the data characteristics. Comparisons between the training and external validation cohorts were made using chi-square tests, Mann-Whitney U tests, and independent-sample t-tests. Univariate and multivariate Logit analyses were used to identify clinical predictors associated with increased Ki-67 expression, and their joint predictive accuracy was assessed using the AUROC. In evaluating the ML model, the AUROC was used for discrimination, calibration curve analysis for model fit, and decision curve analysis (DCA) for net benefits. All statistical analyses were performed using IBM SPSS Statistics (v 22.0, SPSS Inc.) and Python (v 3.7.1).

Result

Patient information

The selection process yielded 231 BC patients meeting the inclusion and exclusion criteria, with high Ki-67 expression levels identified in 67.5% of the cases. These patients were divided into a training cohort (n=162) and an external validation cohort (n=69). High Ki-67 expression was identified in 67.9% and 66.6% of the patients, respectively, with no statistically significant difference (χ² = 0.034, P = 0.854). Figure 2 illustrates the selection process, model establishment, and model evaluation, while Table 1 confirms an even distribution between cohorts without significant disparities in clinical characteristics (all P-values > 0.05).

FIGURE 2

Figure 2 Flow diagram of patient selection process and model establishment & evaluation in BC patients.

TABLE 1

Table 1 Clinical characteristics comparison between training and validation cohorts.

Identification of independent predictors for high Ki-67 expression

Table 2 describes the association between clinical characteristics and high Ki-67 expression through both univariate and multivariate Logit analyses. Age, tumor size, and US-reported positive lymph node (US-reported positive LN) were identified as the independent predictors (all P-values < 0.05). By applying these predictors in a Logit model, the ability to predict Ki-67 expression was found to be moderate, evidenced by an AUROC of 0.709 (Figure 3).

TABLE 2

Table 2 Univariate and multivariate Logit analysis of clinical characteristics associated with high Ki-67 expression.

FIGURE 3

Figure 3 ROC analysis for the prediction of high Ki-67 expression based on age, tumor diameter, and US-reported positive LN, demonstrating moderate prediction accuracy with an AUC of 0.709.

Radiomics feature analysis

Through the process of segmenting intra- and peritumoral ROI in grayscale US from each patient in the training cohort, 1698 radiomics features were identified. Following normalization, 1158 features (68.2%) with intra-observer ICC of 0.85 or higher were retained for stability in subsequent analysis. The application of Student t-test identified 107 features potentially correlated with elevated Ki-67 levels. Final selection, utilizing LASSO regression, isolated 15 significant features associated with Ki-67 expression: 7 intratumoral and 8 peritumoral. These feature distributions are delineated in Figure 4.

FIGURE 4

Figure 4 Distribution of selected features associated with Ki-67 expression: (A) Intratumoral segmentation and (B) Peritumoral segmentation using LASSO regression.

ML model establishment and selection

To identify the optimal predictive model for elevated Ki-67 expression in BC patients, four ML classifiers (SVM, Logit, RF, and XGBoost) were examined. These were systematically applied to intratumoral ROI, peritumoral ROI, and their combination. The respective ROC, calibration, and DCA curves are delineated in Figure 5. It indicated that classifiers employing peritumoral ROI demonstrated superior discrimination ability in contrast to those utilizing intratumoral ROI (AUC: 0.76-0.82 vs. 0.61-0.75, Delong test P < 0.05). Moreover, ML models utilizing combined ROI exhibited the highest discrimination (AUC: 0.75-0.88), with Logit and SVM achieving AUCs of 0.83 and 0.88, respectively. However, the SVM model exhibited better calibration, whereas the Logit model tended to over-approximate probabilities in proximity to the 50% threshold. With comparable performance on DCA curves, the SVM classifier was thus recommended as the most efficacious model for anticipating the likelihood of heightened Ki-67 expression.

FIGURE 5

Figure 5 Evaluation of ML classifiers using various ROIs. (A–C) depict the performance characteristics (ROC, calibration, and DCA curves) of four ML classifiers (SVM, Logit, RF, and XGBoost) in the context of intratumoral ROI, achieving respective AUCs of 0.75, 0.63, 0.62, and 0.61. (D–F) present the same classifiers’ efficacy as applied to peritumoral ROI, with corresponding AUCs of 0.82, 0.80, 0.76, and 0.77. (G–I) illustrate the performance when utilizing combined ROI, where AUCs ascend to 0.88, 0.83, 0.75, and 0.81. Among the classifiers, SVM is highlighted for its superior discrimination, calibration, and comparable DCA curve, endorsing it as the predominant model.

External verification

In evaluating the predictive capability of the SVM model, the external validation cohort was utilized. By integrating the selected intra- and peritumoral radiomics features, the model enabled the automatic calculation of high Ki-67 likelihood for individual patients. Subsequent analysis of these computed probabilities against the actual Ki-67 expression status was performed using ROC, calibration, and DCA curves, as shown in Figure 6. Although exhibiting a slight reduction in performance compared to the training cohort, the SVM model continued to demonstrate significant discriminative abilities, attaining an AUC of 0.82 (Figure 6A). The calibration curve indicated alignment between predicted and actual occurrences when the probability was above 40% (Figure 6B). The DCA added confirmation of the model’s robustness, exhibiting significant net benefits when threshold probability was greater than 40% (Figure 6C). These findings enhanced the potential of the SVM model for high Ki-67 expression prediction.

FIGURE 6

Figure 6 Performance analysis of the optimal ML model using external validation cohort. (A) displays the ROC curve, achieving an AUC of 0.82, which marks the substantial discriminatory power of the model. (B) highlights the calibration curve, revealing alignment between predicted likelihoods and observed events for predictions over 40%. (C) outlines the DCA, highlighting the clinical advantage when the threshold probability is above 40%.

Model interpretation

The interpretation of the SVM model was conducted using SHAP analysis, quantifying the individual contributions of features within the model. The calculation of absolute mean SHAP values led to the ranking of features, highlighting four radiomics features from peritumoral ROI and one from the intratumoral ROI as the five most influential determinants. A summary plot, integrating these SHAP values, was devised for visual representation (Figure 7), thereby providing a comprehensive insight into the role each feature assumed in predicting patient outcomes. Furthermore, to elucidate the implications of each feature, detailed descriptions of each feature in SHAP analysis are provided in Supplementary Table S1. Concurrently, the collinearity among these influential radiomics features, and their associations with selected independent clinical predictors, was analyzed and illustrated in a heatmap, as presented in Figure 8. This analysis revealed minimal mutual correlation, with the highest correlation coefficient not exceeding 0.4, indicating a low degree of collinearity among the selected radiomics features and clinical predictors.

FIGURE 7

Figure 7 Illustration of the radiomics features associated with Ki-67 expression in the SVM model using SHAP analysis. This summary plot blends SHAP values to visually show how individual features together affect the model’s predictions. Each dot stands for a patient, with color change from blue to red indicating the feature values: red for higher and blue for lower values. The horizontal position of the dots explains the SHAP value, where a positive value suggests a higher chance of increased Ki-67 expression, while a negative value suggests the opposite. The x-axis placement of each dot reflects the impact of the respective feature on a particular patient’s prediction, thereby highlighting the correlation between higher values of the top five key features and a greater likelihood of elevated Ki-67 expression.

FIGURE 8

Figure 8 Heatmap illustrating the collinearity among the influential radiomics features in SVM model, and their associations with selected independent clinical predictors of Ki-67 expression. It reveals minimal mutual correlation, with the highest correlation coefficient not exceeding 0.4.

Discussion

The precise and dynamic determination of Ki-67 status is crucial for optimizing treatment strategies in BC patients. Elevated Ki-67 expression often correlates with adverse prognosis but may enhance responsiveness to chemotherapy (27, 28). While many studies have used radiomics to predict Ki-67 levels, they have largely focused on the tumor extent, overlooking essential information in the immediate peritumoral environment (29–31). Moreover, existing research, primarily focused on MRI of intra- and peritumoral regions, has failed to incorporate US-based radiomics, thereby limiting the wider applicability and repeatability of these studies (32–34). The current study addressed this gap by developing an ML prediction model using US radiomics features. An SVM model utilizing combined intra- and peritumoral radiomics features finally proved superior for predicting Ki-67 levels. This methodology may pave the way for a widely applicable, repeatable, and non-invasive evaluation process in personalized BC diagnosis and treatment.

To the best of our understanding, this study marked a first in employing ML models to predict Ki-67 levels using US-based intra- and peritumoral radiomics features. The incorporation of the S-Detect auxiliary diagnostic system, based on a deep convolutional neural network, ensured accurate differentiation of tumor boundaries without manual human input, and overcame challenges such as ultrasonic artifacts and speckles, thereby enhancing the precision and efficiency in clinical applications of radiomics (35, 36). Through careful evaluation and comparison, the optimal ML model selected in this study may mitigate the necessity for frequent CNBs for Ki-67 assessment, serving as a valuable Supplemental Tool. Importantly, it acknowledges the variations in Ki-67 expression across different tumor areas in BC, and thus is not affected by significant cellular proliferation heterogeneity (37). When integrated with core needle tissue sampling, this approach may provide clinicians with a more precise instrument for individualized decision-making, underscoring the potential of this methodology in clinical practice.

Utilizing the SHAP interpretation methodology, the critical elements of our chosen SVM model were identified, comprising four peritumoral and one intratumoral radiomics feature. Briefly, the SHAP methodology allocates a value to each feature, signifying the influence of that feature on the model’s prediction relative to a baseline, thus enhancing model interpretability (38). The insights obtained from the SHAP analysis revealed a robust association between elevated Ki-67 levels and the heterogeneity surrounding the tumor, a finding in line with earlier research (33, 39). The prominence of peritumoral features in our model supports the notion that regions adjacent to the tumor may offer enhanced predictive insight into Ki-67 expression (33, 39). Specifically, these peritumoral regions often exhibit complex cellular interactions and microenvironment changes that may reflect the aggressiveness of the tumor, thereby serving as significant indicators for predicting Ki-67 levels (40). This suggests that peritumoral features are not merely supplementary but hold intrinsic predictive value, offering a broader perspective on Ki-67 expression. Together with intratumoral features, they form a complementary framework that may lead to more accurate and individualized predictions (41, 42). However, it is notable that although we can identify some radiomics features with statistical relevance to clinical outcomes from a variety of categories, many of these belong to texture and higher-order statistical features. These features are fundamentally abstract, being mathematical descriptions derived from imaging data. The data-driven nature of radiomics may not directly reflect the underlying biological processes, making the elucidation of the biological mechanisms linking these features to clinical outcomes challenging at present (43).

Our study confirmed correlations between elevated Ki-67 levels in BC cases and associated clinical factors such as advanced age, larger lesion size, and susceptibility to axillary lymph node metastases, aligning with previous findings (18, 44, 45). Despite these insights, the predictive accuracy of these clinical features remained limited, evidenced by an AUC of merely 0.709. This was notably inferior to the AUC of 0.82 achieved through US-based radiomics features, highlighting the challenge of relying solely on traditional clinical parameters for precise Ki-67 level prediction. In studies focusing on intra- and peritumoral radiomics features, Li et al. (32) and Jiang et al. (33) demonstrated predictive accuracies of 0.749 and 0.838 based on MRI images, respectively. This underlines that our model, centered on intra- and peritumoral US radiomics features, also possesses strong predictive power, warranting further clinical exploration.

Despite the promising findings, our study acknowledges several limitations that require attention. The single-center, retrospective design, along with a restricted patient cohort, could inhibit the wider application of our conclusions. Additionally, inconsistencies in US settings among different institutions might negatively affect the performance of the models. This challenge is further compounded by the decision to restrict the inclusion criteria to lesions with a singular visible mass on US, precluding the extension of our findings to non-mass and multi-focal lesions. Another limitation lay in the lack of evaluation of other prevalent US modalities, such as elastography or contrast-enhanced US, representing an additional limitation and a field for future investigation. A significant aspect of our methodology that called for further investigation was our choice of a 3 mm radial extension from the tumor margin to expand the initial ROI. This decision aimed at balancing the optimal accuracy of a 2-4mm peritumoral region size as recommended by Ding et al. (24), while minimizing the occurrence of segments extending beyond the skin. Future studies should probe into the predictive value of peritumoral regions with varying dilation distances in relation to Ki-67 levels to better understand the implications of this parameter. Lastly, our ML model lacked external validation from additional centers, indicating a necessity for further validation. Despite these barriers, the study does highlight the potential utility of radiomics-based ML models in predicting Ki-67 levels of BC patients. This insight emphasizes the need for future research, specifically through multi-center, prospective studies to enhance the reliability and practicality of the model.

Conclusion

The present study highlighted the capability of ML models, notably the SVM model utilizing radiomics features from both intra- and peritumoral US images, to predict elevated Ki-67 levels in BC patients. The model demonstrated consistent and reliable performance in both internal and external verifications, indicating its promise as a noninvasive preoperative prediction method. Serving as a valuable supplement to CNB, this approach is anticipated to guide treatment strategies and contribute to personalized clinical decision-making for BC patients.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the Institutional Review Board of Jiading District Central Hospital Affiliated Shanghai University of Medicine & Health Sciences (2023K29). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants' legal guardians/next of kin because of the retrospective nature of this research.

Author contributions

JW: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft. WG: Formal Analysis, Investigation, Validation, Writing – original draft. ML: Formal Analysis, Investigation, Software, Supervision, Writing – review & editing. XY: Formal Analysis, Investigation, Methodology, Writing – review & editing, Funding acquisition. DY: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation, Project administration, Resources, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was supported by Shanghai Jiading District Health and Family Planning Commission Health Planning Commission Scientific Research Project (No. 2021-KY-20); Key Medical Discipline of Jiading District, Shanghai (No. 2020-jdyxzdzk-02).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2023.1290313/full#supplementary-material

Abbreviations

AUC, Area Under the Curve; BC, Breast Cancer; CNB, Core Needle Biopsy; DCA, Decision Curve Analysis; GLCM, Gray-Level Co-Occurrence Matrix; GLDM, Gray-Level Dependence Matrix; GLRLM, Gray-Level Run Length Matrix; GLSZM, Gray-Level Size Zone Matrix; LASSO, Least Absolute Shrinkage and Selection Operator; LN, Lymph Node; Logit, Logistic Regression; ML, Machine Learning; NGTDM, Neighbouring Grey Tone Difference Matrix; RF, Random Forests; ROC, Receiver Operating Characteristic; ROI, Region of Interest; SHAP, Shapley Additive Explanation; SVM, Support Vector Machine; US, Ultrasound; XGBoost, Extreme Gradient Boosting.

References

1. Petrelli F, Viale G, Cabiddu M, Barni S. Prognostic value of different cut-off levels of Ki-67 in breast cancer: a systematic review and meta-analysis of 64,196 patients. Breast Cancer Res Treat (2015) 153(3):477–91. doi: 10.1007/s10549-015-3559-0

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Menon SS, Guruvayoorappan C, Sakthivel KM, Rasmi RR. Ki-67 protein as a tumour proliferation marker. Clin Chim Acta (2019) 491:39–45. doi: 10.1016/j.cca.2019.01.011

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Davey MG, Hynes SO, Kerin MJ, Miller N, Lowery AJ. Ki-67 as a prognostic biomarker in invasive breast cancer. Cancers (2021) 13(17):4455. doi: 10.3390/cancers13174455

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Gerdes J, Li L, Schlueter C, Duchrow M, Wohlenberg C, Gerlach C, et al. Immunobiochemical and molecular biologic characterization of the cell proliferation-associated nuclear antigen that is defined by monoclonal antibody Ki-67. Am J Pathol (1991) 138(4):867–73.

PubMed Abstract | Google Scholar

5. Gerlinger M, Rowan AJ, Horswell S, Math M, Larkin J, Endesfelder D, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med (2012) 366(10):883–92. doi: 10.1056/NEJMoa1113205

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Rossi C, Fraticelli S, Fanizza M, Ferrari A, Ferraris E, Messina A, et al. Concordance of immunohistochemistry for predictive and prognostic factors in breast cancer between biopsy and surgical excision: a single-centre experience and review of the literature. Breast Cancer Res Treat (2023) 198(3):573–82. doi: 10.1007/s10549-023-06872-9

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Ellis MJ, Suman VJ, Hoog J, Goncalves R, Sanati S, Creighton CJ, et al. Ki67 proliferation index as a tool for chemotherapy decisions during and after neoadjuvant aromatase inhibitor treatment of breast cancer: results from the American college of surgeons oncology group Z1031 trial (Alliance). J Clin Oncol (2017) 35(10):1061–9. doi: 10.1200/jco.2016.69.4406

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Deo RC. Machine learning in medicine. Circulation (2015) 132(20):1920–30. doi: 10.1161/circulationaha.115.001593

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, et al. Preparing medical imaging data for machine learning. Radiology (2020) 295(1):4–15. doi: 10.1148/radiol.2020192224

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology (2016) 278(2):563–77. doi: 10.1148/radiol.2015151169

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer (2012) 48(4):441–6. doi: 10.1016/j.ejca.2011.11.036

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Liu Z, Wang S, Dong D, Wei J, Fang C, Zhou X, et al. The applications of radiomics in precision diagnosis and treatment of oncology: opportunities and challenges. Theranostics (2019) 9(5):1303–22. doi: 10.7150/thno.30309

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Liu W, Cheng Y, Liu Z, Liu C, Cattell R, Xie X, et al. Preoperative prediction of Ki-67 status in breast cancer with multiparametric MRI using transfer learning. Acad Radiol (2021) 28(2):e44–53. doi: 10.1016/j.acra.2020.02.006

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Liang C, Cheng Z, Huang Y, He L, Chen X, Ma Z, et al. An MRI-based radiomics classifier for preoperative prediction of Ki-67 status in breast cancer. Acad Radiol (2018) 25(9):1111–7. doi: 10.1016/j.acra.2018.01.006

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Zhang Y, Zhu Y, Zhang K, Liu Y, Cui J, Tao J, et al. Invasive ductal breast cancer: preoperative predict Ki-67 index based on radiomics of ADC maps. La radiologia medica (2020) 125(2):109–16. doi: 10.1007/s11547-019-01100-1

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Zhu Y, Dou Y, Qin L, Wang H, Wen Z. Prediction of Ki-67 of invasive ductal breast cancer based on ultrasound radiomics nomogram. J Ultrasound Med (2023) 42(3):649–64. doi: 10.1002/jum.16061

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Wu J, Fang Q, Yao J, Ge L, Hu L, Wang Z, et al. Integration of ultrasound radiomics features and clinical factors: A nomogram model for identifying the Ki-67 status in patients with breast carcinoma. Front Oncol (2022) 12:979358. doi: 10.3389/fonc.2022.979358

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Liu J, Wang X, Hu M, Zheng Y, Zhu L, Wang W, et al. Development of an ultrasound-based radiomics nomogram to preoperatively predict Ki-67 expression level in patients with breast cancer. Front Oncol (2022) 12:963925. doi: 10.3389/fonc.2022.963925

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Kessenbrock K, Plaks V, Werb Z. Matrix metalloproteinases: regulators of the tumor microenvironment. Cell (2010) 141(1):52–67. doi: 10.1016/j.cell.2010.03.015

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Liu Z, Mi M, Li X, Zheng X, Wu G, Zhang L. A lncRNA prognostic signature associated with immune infiltration and tumour mutation burden in breast cancer. J Cell Mol Med (2020) 24(21):12444–56. doi: 10.1111/jcmm.15762

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Uematsu T. Focal breast edema associated with Malignancy on T2-weighted images of breast MRI: peritumoral edema, prepectoral edema, and subcutaneous edema. Breast Cancer (2015) 22(1):66–70. doi: 10.1007/s12282-014-0572-9

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Christiansen A, Detmar M. Lymphangiogenesis and cancer. Genes Cancer (2011) 2(12):1146–58. doi: 10.1177/1947601911423028

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Goldhirsch A, Wood WC, Coates AS, Gelber RD, Thürlimann B, Senn HJ. Strategies for subtypes–dealing with the diversity of breast cancer: highlights of the St. Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. . Ann Oncol (2011) 22(8):1736–47. doi: 10.1093/annonc/mdr304

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Ding J, Chen S, Serrano Sosa M, Cattell R, Lei L, Sun J, et al. Optimizing the peritumoral region size in radiomics analysis for sentinel lymph node status prediction in breast cancer. Acad Radiol (2022) 29 Suppl 1(Suppl 1):S223–s8. doi: 10.1016/j.acra.2020.10.015

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed (2022) 214:106584. doi: 10.1016/j.cmpb.2021.106584

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Martini ML, Neifert SN, Oermann EK, Gilligan JT, Rothrock RJ, Yuk FJ, et al. Application of cooperative game theory principles to interpret machine learning models of nonhome discharge following spine surgery. Spine (Phila Pa 1976) (2021) 46(12):803–12. doi: 10.1097/brs.0000000000003910

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Gown AM. The biomarker Ki-67: promise, potential, and problems in breast cancer. Appl Immunohistochem Mol Morphol (2023) 31(7):478–84. doi: 10.1097/pai.0000000000001087

PubMed Abstract | CrossRef Full Text | Google Scholar

28. de Azambuja E, Cardoso F, de Castro G Jr., Colozza M, Mano MS, Durbecq V, et al. Ki-67 as prognostic marker in early breast cancer: a meta-analysis of published studies involving 12,155 patients. Br J Cancer (2007) 96(10):1504–13. doi: 10.1038/sj.bjc.6603756

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Fan Y, Pan X, Yang F, Liu S, Wang Z, Sun J, et al. Preoperative computed tomography radiomics analysis for predicting receptors status and Ki-67 levels in breast cancer. Am J Clin Oncol (2022) 45(12):526–33. doi: 10.1097/coc.0000000000000951

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Feng S, Yin J. Radiomics of dynamic contrast-enhanced magnetic resonance imaging parametric maps and apparent diffusion coefficient maps to predict Ki-67 status in breast cancer. Front Oncol (2022) 12:847880. doi: 10.3389/fonc.2022.847880

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Kayadibi Y, Kocak B, Ucar N, Akan YN, Akbas P, Bektas S. Radioproteomics in breast cancer: prediction of Ki-67 expression with MRI-based radiomic models. Acad Radiol (2022) 29 Suppl 1:S116–s25. doi: 10.1016/j.acra.2021.02.001

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Li C, Song L, Yin J. Intratumoral and peritumoral radiomics based on functional parametric maps from breast DCE-MRI for prediction of HER-2 and Ki-67 status. J Magn Reson Imaging (2021) 54(3):703–14. doi: 10.1002/jmri.27651

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Jiang T, Song J, Wang X, Niu S, Zhao N, Dong Y, et al. Intratumoral and peritumoral analysis of mammography, tomosynthesis, and multiparametric MRI for predicting ki-67 level in breast cancer: a radiomics-based study. Mol Imaging Biol (2022) 24(4):550–9. doi: 10.1007/s11307-021-01695-w

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Zhang S, Wang X, Yang Z, Zhu Y, Zhao N, Li Y, et al. Intra- and peritumoral radiomics model based on early DCE-MRI for preoperative prediction of molecular subtypes in invasive ductal breast carcinoma: A multitask machine learning study. Front Oncol (2022) 12:905551. doi: 10.3389/fonc.2022.905551

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Zhang D, Jiang F, Yin R, Wu GG, Wei Q, Cui XW, et al. A review of the role of the S-detect computer-aided diagnostic ultrasound system in the evaluation of benign and malignant breast and thyroid masses. Med Sci Monit (2021) 27:e931957. doi: 10.12659/msm.931957

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Bian J, Wang R, Lin M. Ultrasonic S-Detect mode for the evaluation of thyroid nodules: A meta-analysis. Med (Baltimore) (2022) 101(34):e29991. doi: 10.1097/md.0000000000029991

CrossRef Full Text | Google Scholar

37. Kutasovic JR, McCart Reed AE, Sokolova A, Lakhani SR, Simpson PT. Morphologic and genomic heterogeneity in the evolution and progression of breast cancer. Cancers (Basel) (2020) 12(4):848. doi: 10.3390/cancers12040848

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Marcílio WE, Eler DM. From explanations to feature selection: assessing SHAP values as feature selection mechanism. 2020 33rd SIBGRAPI Conf Graphics Patterns Images (SIBGRAPI) (2020) 2020:7–10.

Google Scholar

39. Wu Y, Ma Q, Fan L, Wu S, Wang J. An automated breast volume scanner-based intra- and peritumoral radiomics nomogram for the preoperative prediction of expression of Ki-67 in breast Malignancy. Acad Radiol (2023) S1076-6332(23)00352-5. doi: 10.1016/j.acra.2023.07.004

CrossRef Full Text | Google Scholar

40. Annaratone L, Cascardi E, Vissio E, Sarotto I, Chmielik E, Sapino A, et al. The multifaceted nature of tumor microenvironment in breast carcinomas. Pathobiology (2020) 87(2):125–42. doi: 10.1159/000507055

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Braman N, Prasanna P, Whitney J, Singh S, Beig N, Etesami M, et al. Association of peritumoral radiomics with tumor biology and pathologic response to preoperative targeted therapy for HER2 (ERBB2)-positive breast cancer. JAMA Netw Open (2019) 2(4):e192561. doi: 10.1001/jamanetworkopen.2019.2561

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Mao N, Shi Y, Lian C, Wang Z, Zhang K, Xie H, et al. Intratumoral and peritumoral radiomics for preoperative prediction of neoadjuvant chemotherapy effect in breast cancer based on contrast-enhanced spectral mammography. Eur Radiol (2022) 32(5):3207–19. doi: 10.1007/s00330-021-08414-7

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Tomaszewski MR, Gillies RJ. The biological meaning of radiomic features. Radiology (2021) 298(3):505–16. doi: 10.1148/radiol.2021202553

PubMed Abstract | CrossRef Full Text | Google Scholar

44. Cheng C, Zhao H, Tian W, Hu C, Zhao H. Predicting the expression level of Ki-67 in breast cancer using multi-modal ultrasound parameters. BMC Med Imaging (2021) 21(1):150. doi: 10.1186/s12880-021-00684-3

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Criscitiello C, Disalvatore D, De Laurentiis M, Gelao L, Fumagalli L, Locatelli M, et al. High Ki-67 score is indicative of a greater benefit from adjuvant chemotherapy when added to endocrine therapy in luminal B HER2 negative and node-positive breast cancer. Breast (2014) 23(1):69–75. doi: 10.1016/j.breast.2013.11.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: breast cancer, Ki-67 levels, radiomics, peritumoral ultrasound segmentation, machine learning, support vector machine

Citation: Wang J, Gao W, Lu M, Yao X and Yang D (2023) Development of an interpretable machine learning model for Ki-67 prediction in breast cancer using intratumoral and peritumoral ultrasound radiomics features. Front. Oncol. 13:1290313. doi: 10.3389/fonc.2023.1290313

Received: 07 September 2023; Accepted: 02 November 2023;
Published: 17 November 2023.

Edited by:

Nosheen Masood, Fatima Jinnah Women University, Pakistan

Reviewed by:

Qin Genggeng, Southern Medical University, China
Rami Vanguri, New York University, United States
Zilong He, Southern Medical University, China

Copyright © 2023 Wang, Gao, Lu, Yao and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Debin Yang, eXlkZHJyMTIzQDE2My5jb20=

^†These authors have contributed equally to this work

^‡ORCID: Debin Yang, orcid.org/0000-0001-9969-5700

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Development of an interpretable machine learning model for Ki-67 prediction in breast cancer using intratumoral and peritumoral ultrasound radiomics features

Introduction

Materials and methods

Patient selection

Image acquisition and segmentation

Radiomics feature extraction and selection

Development and internal verification of ML models

External verification and interpretability of the optimal model

Statistical analysis

Result

Patient information

Identification of independent predictors for high Ki-67 expression

Radiomics feature analysis

ML model establishment and selection

External verification

Model interpretation

Discussion

Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

Abbreviations

References

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good