METHODS article

Front. Oncol., 11 October 2022

Sec. Molecular and Cellular Oncology

Volume 12 - 2022 | https://doi.org/10.3389/fonc.2022.931035

Impact of automated methods for quantitative evaluation of immunostaining: Towards digital pathology

  • 1. Normandie Univ, UNICAEN, Federative Structure 4207 ā€˜Normandie Oncologie’, PLATON Services Unit, Virtual’His platform, Caen, France

  • 2. Normandie Univ, UNICAEN, Federative Structure 4207 ā€˜Normandie Oncologie’, PLATON Services Unit, Caen, France

  • 3. Normandie Univ, UNICAEN, Inserm U1086 ANTICIPE, Interdisciplinary Research Unit for Cancer Prevention and Treatment, Federative Structure 4207 ā€˜Normandie Oncologie’, F. Baclesse Comprehensive Cancer Centre, Caen, France

  • 4. UNICANCER, F. Baclesse Comprehensive Cancer Centre, Caen, France

  • 5. UNICANCER, F. Baclesse Comprehensive Cancer Centre, Biopathology Department, Caen, France

  • 6. Department of Pathology, Forensic Medicine and Pharmacology, Institute of Biomedical Sciences of the Faculty of Medicine, Vilnius University, Vilnius, Lithuania

Abstract

Introduction:

We sought to develop a novel method for a fully automated, robust quantification of protein biomarker expression within the epithelial component of high-grade serous ovarian tumors (HGSOC). Rather than defining thresholds for a given biomarker, the objective of this study in a small cohort of patients was to develop a method applicable to the many clinical situations in which immunomarkers need to be quantified. We aimed to quantify biomarker expression by correlating it with the heterogeneity of staining, using a non-subjective choice of scoring thresholds based on classical mathematical approaches. This could lead to a universal method for quantifying other immunohistochemical markers to guide pathologists in therapeutic decision-making.

Methods:

We studied a cohort of 25 cases of HGSOC for which three biomarkers predictive of the response observed ex vivo to the BH3 mimetic molecule ABT-737 had been previously validated by a pathologist. We calibrated our algorithms using Stereology analyses performed by two experts to detect immunohistochemical staining and epithelial/stromal compartments. Immunostaining quantification within Stereology grids of hexagons was then performed for each histological slice. To define thresholds from the staining distribution histograms and to classify staining within each hexagon as low, medium, or high, we used the Gaussian Mixture Model (GMM).

Results:

Stereology analysis of this calibration process produced a good correlation between the experts for both epithelium and immunostaining detection. There was also a good correlation between the experts and image processing. Image processing clearly revealed the respective proportions of low, medium, and high areas in a single tumor and showed that this parameter of heterogeneity could be included in a composite score, thus decreasing the level of discrepancy. Therefore, agreement with the pathologist was increased by taking heterogeneity into account.

Conclusion and discussion:

This simple, robust, calibrated method using basic tools and known parameters can be used to quantify and characterize the expression of protein biomarkers within the different tumor compartments. It is based on known mathematical thresholds and takes the intratumoral heterogeneity of staining into account. Although some discrepancies need to be diminished, correlation with the pathologist’s classification was satisfactory. The method is replicable and can be used to analyze other biological and medical issues. This non-subjective technique for assessing protein biomarker expression uses a fully automated choice of thresholds (GMM) and defined composite scores that take the intra-tumor heterogeneity of immunostaining into account. It could help to avoid the misclassification of patients and its subsequent negative impact on therapeutic care.

Introduction

Protein expression and localization and some post-translational modifications are crucial to many biological processes. As a result, dysregulation is frequently associated with pathological disorders. A current focus of biologists and pathologists is to assess protein expression as accurately as possible. Such studies frequently include the evaluation of the intensity of protein expression level, its subcellular localization, and heterogeneity within whole tissue sections. Appropriate staining methods are required to evaluate these parameters on histological sections. One of these methods is immunohistochemistry, which is widely used in experimental research and in routine clinical practice in pathology laboratories.

The assessment of the intensity of expression level has become a hot topic (1, 2). This intensity can be influenced by many factors as a result of immuno-histochemical labeling (3). However, if the same conditions are applied according to a well-defined protocol (fixation time, same staining conditions), it is possible to compare expression levels (4). Such quantitative evaluations are regularly used in clinical practice for biomarkers such as Her2/neu, estrogen receptor (ER), and progesterone receptor (PR) for which testing guidelines have been established (5, 6). However, an evaluation of the percentage of positive cells and/or of global protein expression level is sometimes unable to provide sufficient relevant information. Moreover, the subjective perception of pathologists may create a bias. This is exemplified by numerous works showing both intra- and inter-observer variability in results (7–9). These discrepancies arise partly from the subjectivity of the measurement but also from the distribution and heterogeneity of the intensity of markers in whole tissue sections. Indeed, it is sometimes difficult to evaluate the presence of different foci with different staining intensities on the same section. Nevertheless, in some cases such as Ki-67 proliferative marker quantification, the contribution of the association to this section of the value of the most represented or strongest focus has been considered pertinent for some predictive purposes (10).

Nowadays, thanks to technologies such as digital slide images and automated image analysis, it is possible first to quantify the expression level and second to account for these heterogeneous components by integrating quantitative parameters of heterogeneity (11). In the search for an innovative automated quantitative method, we investigated the possibility of automatically quantifying the expression of biomarkers previously reported as able to predict the response to a BH3-mimetic molecule (ABT-737) in ovarian tumor slices cultivated and exposed ex vivo to this drug (12). This attempted to automatically evaluate staining intensity in whole-slide images of tumor tissues, based on systematic subsampling in a hexagonal tiling array. The technique is based on the Stereology theory (13), which allows non-biased results to be estimated accurately by means of grids (crosses, squares, hexagons, etc.).

Three different immunomarkers were evaluated comparatively by a pathologist and image analysis. We used the proteins Mcl-1, Bim, and P-ERK, which were identified as predictive biomarkers in a previous study and exhibited various expression patterns and histological heterogeneity. First, we used Stereology laws by using a grid of crosses to obtain reference values in order to adjust the image processing (IP) and to estimate the protein expression level revealed by 3,3′-diaminobenzidine (DAB)-labeled intensity in the hexagonal tiling. Second, we applied two methods to assess positive and negative cases relative to the protein expressions: the Gaussian Mixture Model (GMM) (14) considering only labeling intensity and principal component analysis (PCA), which also takes heterogeneity into account. This allowed us to propose a scoring method using the PCA from the expression levels of Bim, Mcl-1, and P-ERK.

Materials and methods

Eligibility

For this study, we used the data previously obtained in patients diagnosed with advanced high-grade serous ovarian cancer (HGSOC) and no prior chemotherapy exposure. Tumor nodules from the peritoneal carcinomatosis were obtained during initial surgery and used for the ā€œABT/CARBO ex vivoā€ study. The protocol received all necessary institutional approval, and all patients provided written informed consent (NCT01440504).

Immunohistochemical analysis

Automated immunohistochemistry using a DakoCytomation Autostainer was performed on 4-μm-thick paraffin sections. The mouse monoclonal antibody anti-Mcl-1 (Y37) was obtained from Abcam (Paris, France). The rabbit monoclonal antibodies anti-Bim (C34C5) and Phospho-p44/42 MAPK (Thr202/Tyr204) (D13.14.4E) corresponding to Phospho-Erk1/2 and noted P-ERK were obtained from Cell Signaling (Ozyme, Saint Quentin Yvelines, France).

Immunohistochemistry procedures were as previously described (12). Briefly, to unmask epitopes, deparaffinized slides were treated for 15Ā min by a high-temperature-heating antigen retrieval technique in EDTA buffer 0.5M pH 8 for Bim (EL, L, and S isoforms) and Mcl-1 antibodies and in citrate buffer 0.07M pH6 for P-ERK antibody. Sections were incubated for 1Ā h at room temperature with the primary antibodies. After washing, slides were incubated with the Perox Detect System (Novocastra, Leica Microsystems, Nanterre, France), according to the manufacturer’s instructions. Staining was performed with DAB chromogen, and sections were counterstained with hematoxylin QS (Vector Laboratories, Clinisciences, Nanterre, France). Stained slides were then digitized with a ScanScope CS slide scanner (Leica Biosystems, Nussloch, Germany).

Immunostaining evaluation by the pathologist

Marker quantification or classification was assessed as follows by an independent certified pathologist. For Mcl-1 and Bim, the intensity of the staining was scored as high or low according to the degree of homogeneity of the staining pattern observed. Regarding P-ERK, the expression of the phosphorylated forms of ERK was strongly heterogeneous in the tumor nodules, contrary to the other proteins. We then assessed these parameters using both the percentage of marked cells and the staining intensity. Staining was considered as high only if the percentage of stained cells with high intensity was at least equal to 50% (12).

Digital acquisition

Whole-slide images (WSIs) of histological sections were digitized with a 20Ɨ objective (0.5 µm/pixel) using the ScanScope CS slide scanner. Images were recorded as tiled pyramidal tiff images.

Image processing: ROI definition. For each image, a region of interest (ROI) was drawn using the ImageScope software (Leica Biosystems, Nussloch, Germany) to remove the artifacts and keep only tumoral areas on histological sections. For subsequent analysis, all processing operations were applied only in the regions of interest.

Immunostaining detection

Immunostaining detection was first assessed by a color space change and second by a pixel classification (13): (i) the Otha color space is particularly suited to histochemistry staining because the second layer can adjust the blue and red colors (15), and (ii) the algorithm used to search for the maximum likelihood had separated a color histogram by several normal distributions, called the ā€œGaussian Mixture Modelā€ (14). The algorithm was adjusted relative to the Gaussian function parameters using a Stereology procedure to calibrate it.

Epithelium detection

Epithelium segmentation was performed relative to the ā€œtime–frequencyā€ wavelet algorithm proposed by Denis Gabor (16). One of its well-known applications is Gaussian beam sampling in optics (17). Moreover, the wavelet theory and especially Gabor’s tight frame improved the mathematical concept (18). These two functionalities allowed us to create a processing algorithm of digital images modeled by the Fourier transform with a two-dimensional sliding window: (i) Fourier transform with a Gaussian window on the intensity image, (ii) low-pass filtering weighted by a cosine, and then (iii) inverse Fourier transform with the previous Gaussian window. The IP was completed (i) by implementing a segmentation by moments to obtain a binary image and (ii) by performing a morphological opening operation to eliminate the residual noise and the small objects in the binary image (19).

Quality control

This was performed using a Stereology test grid of crosses. This method was used to superpose an ROI located randomly with a regular network of points (20). This method consisted in superimposing a stereological grid of crosses at random in the region of interest (20). The image readers had to put two types of mark under each cross, positive or negative, depending on the color or the staining intensity. In routine practice, this two-mark process allows an estimation of the surface ratio with an uncertainty computation (21). Here, the method was applied to adjust the IP parameters by best matching the positive and negative crosses positioned by the readers with the positive and negative surfaces detected by IP. Then, a quality factor was evaluated from the uncertainty. Two types of quality factors were computed for the different algorithm types. The first was computed by comparing the confidence intervals (95%) between the crosses marked by the experts and the virtual marked crosses drawn inside the surfaces detected by the IP. Thus, the calculation of the overlap of the confidence intervals gave the first quality factor. The second was computed by searching for the true and false positive crosses and the true and false negative crosses relative to the expert cross set and to the virtual cross set established by the IP. Thus, the average of the sensitivity and the specificity gave the second quality factor (22).

Overlay epithelium/immunostaining

To evaluate the immunostaining rate only within the epithelium, the detection image ā€œOTHAā€ was associated with the detection image of the epithelium ā€œGABOR.ā€ The resulting image contained only the colored pixels in the epithelial territories.

Transmittance was performed using a Stereology test grid of hexagons. ROIs were regularly subdivided by hexagons to take the heterogeneity of immunostaining within the epithelium into account, a technique described previously (11). For each hexagon, the transmittance was calculated as follows:

Therefore, the transmittance was between 0 and 1.

Gaussian thresholding

Automatic thresholds were established by applying the algorithm of the GMM (14). Three Gaussian functions were sought directly in a histogram by the GMM algorithm. The representative curves gave intersection points defining possible thresholds. Two thresholds delimited three areas that we termed low, middle, and high, respectively.

Principal Component Analysis score

Principal Component Analysis (PCA) (23) was used to establish a biological score: (i) the first three statistical moments were computed from the hexagonal tiling, case per case, (ii) the PCA algorithm was applied to the previous data set, and (iii) the principal component with the maximum variance was considered as the score. This score was normalized as a percentage before analysis.

Results

Calibration and validation of quantitative evaluation of immunostaining tool

Three images were used for each marker: 2441 points for Bim, 6467 for Mcl-1, and 4735 for P-ERK.

Labeled and unlabeled marks were independently identified by the two experts on Stereology grids (FigureĀ 1A). Two steps were needed to calibrate the labeled areas: (i) parameter adjustment of image processing (IP) and (ii) quality factor computation after calibration. IP calibration was based on the surface occupied by the labeled and unlabeled pixels. Grids with crosses were then used on three DAB-stained whole slide-images to detect Bim, Mcl-1, and P-ERK, respectively.

FigureĀ 1

First, two experts examined the different grids and superimposed marks (labeled, unlabeled, and others). We kept only the marks common to the two experts in order to improve the IP setting. Then, calibration was checked simultaneously on three Bim cases, three Mcl-1 cases, and three P-ERK cases (TableĀ 1). Second, the quality factor was established: first between the experts and then between the reference expert and IP performed after the calibration presented above (TableĀ 2). FigureĀ 1B shows that before the expert-based calibration, the staining detection performed by IP was unsatisfactory, whereas the calibration process allowed complete staining detection.

TableĀ 1

P15 BimExpert 1 vs. 2Experts vs. IPP25 Mcl-1Expert 1 vs. 2Experts vs. IPP19 P-ERKExpert 1 vs. 2Experts vs. IP
Labeled mark73.04%99.33%Labeled mark86.53%92.77%Labeled mark90.00%89.08%
Unlabeled mark96.17%83.58%Unlabeled mark93.73%91.87%Unlabeled mark98.32%90.73%
Total mark413316Total mark17721483Total mark507450
P18 BimExpert 1 vs. 2Experts vs. IPP29 Mcl-1Expert 1 vs. 2Experts vs. IPP26 P-ERKExpert 1 vs. 2Experts vs. IP
Labeled mark63.23%96.94%Labeled mark71.96%93.04%Labeled mark83.93%100.00%
Unlabeled mark87.31%94.48%Unlabeled mark96.03%50.46%Unlabeled mark99.06%81.92%
Total mark549420Total mark19331066Total mark15411252
P26 BimExpert 1 vs. 2Experts vs. IPP37 Mcl-1Expert 1 vs. 2Experts vs. IPP36 P-ERKExpert 1 vs. 2Experts vs. IP
Labeled mark73.31%99.51%Labeled mark83.78%95.75%Labeled mark74.84%89.08%
Unlabeled mark82.39%89.77%Unlabeled mark94.92%84.05%Unlabeled mark97.89%85.21%
Total mark14791091Total mark27622207Total mark26872188

Table of concordance of marks made on Stereology grids (one table/marker) by two experts (second column).

Marks common to both experts were used for calibration and then compared to IP (third column).

TableĀ 2

Expert 1Expert 1Expert 1
SensitivitySpecificityQuality FactorSensitivitySpecificityQuality FactorSensitivitySpecificityQuality Factor
P15 BimP25 Mcl-1P19 P-ERK
Expert 270.90%94.59%82.75%
Expert 270.90%94.59%82.75%Expert 297.85%92.05%94.95%
IP_OHTA54.44%90.48%72.46%IP_OHTA54.44%90.48%72.46%IP_OHTA89.43%75.86%82.65%
P18 BimP29 Mcl-1P26 P-ERK
Expert 285.79%66.22%76.01%
Expert 285.79%66.22%76.01%Expert 299.39%77.05%88.22%
IP_OHTA82.62%62.28%72.45%IP_OHTA82.62%62.28%72.45%IP_OHTA98.64%45.71%72.18%
P26 BimP37 Mcl-1P36 P-ERK
Expert 292.94%49.40%71.17%
Expert 292.94%49.40%71.17%Expert 296.67%82.64%89.66%
IP_OHTA89.82%55.77%72.80%IP_OHTA89.82%55.77%72.80%IP_OHTA91.52%80.85%86.19%

Quality factors: factors between two experts and factor between reference expert and image processing (one table/marker).

Calibration and validation of quantitative evaluation of epithelium detection tools

As previously performed for immunostaining detection, three images were used for each marker to evaluate and calibrate the detection of epithelium. Then, the quality factor was established using inter-expert agreement. IP calibration was applied to the epithelium extraction to establish the best filter and most efficient size of the morphological opening. First, the intersection between the marked crosses of the experts and the IP masks allowed these two parameters to be adjusted. Then, the true-positive, true-negative, false-positive, and false-negative marks were found between the two experts. Finally, masks built by IP were compared to the marked crosses of the reference expert (TableĀ 3).

TableĀ 3

Expert 1Expert 1Expert 1
SensitivitySpecificityFactor QualitySensitivitySpecificityFactor QualitySensitivitySpecificityFactor Quality
P14 BimP14 Mcl-1P14 P-ERK
100.00%99.25%99.63%Expert 297.92%99.06%98.49%Expert 297.44%97.64%97.54%
76.67%97.78%87.23%IP_GABOR89.58%92.45%91.02%IP_GABOR61.54%89.50%75.52%
P24 BimP24 Mcl-1P24 P-ERK
91.03%98.47%94.75%Expert 294.24%92.04%93.14%Expert 292.08%95.62%93.85%
93.62%89.47%91.55%IP_GABOR86.84%91.53%89.19%IP_GABOR73.90%90.14%82.02%
P28 BimP28 Mcl-1P28 P-ERK
93.89%97.67%95.78%Expert 299.08%93.00%97.54%Expert 298.23%95.83%97.03%
95.20%90.32%92.76%IP_GABOR96.43%80.20%88.32%IP_GABOR86.84%88.68%87.76%
P32 BimP32 Mcl-1P32 P-ERK
84.38%94.64%89.51%Expert 293.42%94.43%93.93%Expert 270.97%98.43%84.70%
82.18%74.62%78.40%IP_GABOR88.46%78.04%83.25%IP_GABOR54.17%74.62%64.40%

Quality factors between two experts (expert 1 defined as reference) and factor between reference expert and image processing were calculated from results obtained on four WSIs stained with DAB for each marker (one table/marker).

FigureĀ 1C shows epithelium detection by IP on several cases.

Application to automated immunostaining quantification

First approach: use of immunostaining classes

Next, we used the proposed method on all the WSI for each marker, as shown in FigureĀ 2. Histograms were analyzed with the GMM. They were built by accumulating all hexagon values collected from the 25 WSIs for Bim, Mcl-1, and P-ERK independently. Two thresholds for each were found by intersecting the Gaussian curves (FigureĀ 3). Thus, three classes were built for each IHC staining, termed entitled low, medium, and high (FigureĀ 4).

FigureĀ 2

FigureĀ 3

FigureĀ 4

The main class was defined as the one that contained the most hexagon patterns. The comparison with the estimation of the expert required grouping together two classes in order to keep only two classes called ā€œlowā€ and ā€œhigh.ā€ Thus, a given case could be considered ā€œlowā€ or ā€œhighā€ depending on the largest number of hexagons in classes 0, 1, or 2 obtained by IP (TableĀ 4).

TableĀ 4

PatientNumber of
hexagons by class
Score IA BimScore Expert BimPatientNumber of
hexagons by class
Score IA Mcl-1Score Expert Mcl-1PatientNumber of
hexagons by class
R 50%Score IA
P-ERK
Score Expert P-ERK
class 0class 1class 2class 0class 1class 2class 0class 1class 2
P101987700highhighP10919362lowlowP109492010.10%lowlow
P1212041620lowlowP1265324637lowhighP12589514201.78%lowlow
P135124150lowlowP1311146694highhighP1327832243842.20%lowlow
P140366143highhighP149193227highhighP14210420233.52%lowlow
P15092338highhighP153637477highhighP155217829856.44%highlow
P17082highhighP17715354lowhighP172746546.10%lowhigh
P1856499highhighP183137715highhighP1820726433241.34%lowhigh
P19042297highhighP19292261highhighP19445234378.13%highhigh
P2110141038highhighP211571005209highlowP21128440594.27%lowlow
P2204977highhighP22061146highhighP2203000.00%lowhigh
P2407951367highhighP24784807181highlowP2489630274538.34%lowlow
P252707942highhighP2582531664highhighP25122538942020.65%lowlow
P260179158highlowP26310071202highlowP261112529100.61%lowlow
P2727849175highhighP2710513391highhighP27593664886.54%lowlow
P28060768highhighP2809801highhighP2810310772977.64%highlow
P2942129114highlowP29123163462lowlowP295771250603.18%lowlow
P30030315highhighP3021291136highhighP305713825356.47%highlow
P31222107240highhighP314101735199highlowP3127571688947.29%lowlow
P327116364lowhighP3265430122lowlowP32263910282.33%lowlow
P330976703highhighP33295595239highlowP3334838096156.90%highlow
P3413563206highhighP34177279207highhighP3419619954557.98%highhigh
P35363691highlowP35333892highlowP3591938093.14%highhigh
P360381603highhighP3619481221highhighP3694265492436.67%lowlow
P377208664highhighP37321579578highhighP37201228241784.93%highhigh
P382049275highhighP3827416162highhighP384881019271.76%lowlow

Distribution in three classes by IP but only two by expert analysis.

Bim and Mcl-1 are grouped in the last two classes and P-ERK in the first two classes for comparison with expert estimation (last two columns in the table). Gray labels indicate discordant cases between expert and IP analysis.

The P-ERK table contains a supplementary column entitled ā€œR 50%ā€ because the labeled ā€œhighā€ was validated only if the main class contained 50% of the hexagon patterns. Four, eight, and seven cases were found to be discordant for the WSI stained by the DAB relative to Bim, Mcl-1, and P-ERK respectively (TableĀ 4).

Second approach: use of immunostaining scores

The second approach used the PCA algorithm (FigureĀ 5). First, the first three statistical moments were computed (means, standard deviation, and skewness) from each histogram of the hexagon patterns, case per case. Second, the PCA algorithm was applied to the three statistical moments to find their best linear combination by exploiting the first principal factor. This procedure gave the biological score (TableĀ 5).

FigureĀ 5

TableĀ 5

BimI_MoyI_SigI_SkwCP1Mcl-1I_MoyI_SigI_SkwCP1P-ERKI_MoyI_SigI_SkwCP1
P100.370.0350.178-1.208P100.3160.0261.307-2.74P100.0090.0556.372-4.08
P120.3120.0250.679-2.795P120.360.0351.193-1.47P120.1850.1880.083-0.951
P130.3350.040.208-1.187P130.4020.0370.738-0.25P130.3560.227-0.7110.228
P140.4760.0440.1290.263P140.4490.0410.4830.928P140.2690.172-0.798-0.23
P150.5310.0390.0970.479P150.4110.0341.235-0.989P150.4230.146-2.3251.064
P170.4350.0580.6420.296P170.4040.0640.4341.92P170.4490.069-3.7311.623
P180.4140.0330.488-1.237P180.470.041.467-0.216P180.3440.205-1.0210.255
P190.5610.055-0.4472.22P190.4710.0480.4811.648P190.4770.168-2.1621.301
P210.420.0370.666-1.148P210.4060.0340.81-0.498P210.0330.1213.374-2.894
P220.5150.040.1030.384P220.4650.040.9230.449P220.390.03-0.2780.123
P240.5250.0480.5010.54P240.3840.0390.626-0.217P240.2670.25-0.083-0.429
P250.5080.054-0.1551.433P250.4940.047-0.132.683P250.1880.2340.505-1.051
P260.4250.0360.594-1.065P260.450.0360.6160.456P260.1310.1720.606-1.419
P270.4490.053-0.2040.924P270.4360.030.97-0.553P270.2350.21-0.184-0.588
P280.5560.039-1.0871.844P280.5120.0380.4251.559P280.460.169-2.0721.184
P290.4250.0421.374-1.483P290.3630.0361.405-1.653P290.2680.178-0.763-0.241
P300.5880.0640.182.375P300.4250.035-0.0330.927P300.4320.176-1.5960.886
P310.4520.042-0.4610.535P310.3970.0331.297-1.271P310.3980.169-1.7810.769
P320.3440.0360.988-2.157P320.3610.0330.865-1.18P320.2980.161-1.2170.052
P330.4940.0450.3910.249P330.4050.0511.1530.128P330.390.205-1.2250.56
P340.4640.056-0.1571.167P340.410.0580.1851.948P340.4070.219-1.1060.617
P350.3790.0290.41-1.68P350.4260.0321.251-0.936P350.5310.093-3.9312.13
P360.5880.0460.2361.191P360.4520.0421.2420.049P360.2960.231-0.444-0.173
P370.4260.0360.303-0.78P370.4270.0341.364-0.96P370.4690.133-2.9821.513
P380.4450.05-0.3190.841P380.4220.0310.2940.238P380.2690.184-0.729-0.248
CPI_MoyI_SigI_SkwCPI_MoyI_SigI_SkwCPI_MoyI_SigI_Skw
163.68%57.65%-51.20%1-54.58%-56.24%62.11%170.51%3.84%-70.81%

First three statistical moments in columns on left, values computed by main principal component on right and loadings (weight of statistical moments) for principal component in the last line.

The PCA values were normalized as a percentage to use the GMM. Three Gaussian functions and two thresholds were sought, as in the first approach (FigureĀ 6).

FigureĀ 6

Two automatic thresholds were adjusted in the neighborhood of the Gaussian curve intersection for the purpose of comparison with the estimation of the expert (identical to the first approach). Thus, each threshold delimited two areas: ā€œlow and highā€ tumor response for Bim (45%), Mcl-1 (25%), and P-ERK (85%), respectively. These thresholds were considered as the new origins to better assess the level relative to the tumor response, i.e., negative score for ā€œlowā€ and positive score for ā€œhighā€ (FigureĀ 6). Five, six, and three cases were found discordant for the WSI stained by DAB relative to Bim, Mcl-1, and P-ERK, respectively (FigureĀ 6).

Discussion

We sought to develop a novel fully automated robust method for quantifying protein biomarker expression, here applied to the epithelial component of high-grade serous ovarian carcinomas (HGSOC). Our approach was based only on a cohort of 25 cases of HGSOC for which three biomarkers predictive of the response observed ex vivo to the BH3 mimetic molecule ABT-737 had been validated by a pathologist (12). This is a strength, since we tried to remain as close as possible to the results obtained by the pathologist. It is also a limitation, since variations are known to occur between two interpretations of a unique histological slice by the same pathologist and by two or more pathologists (7–9).

We calibrated our algorithms using stereology performed by two experts to detect both immunohistochemical staining and epithelial/stromal compartments, as recommended as a quality check (24). Analyses of this calibration processes showed a good correlation between the experts for both epithelium and immunostaining detection. Furthermore, there was a good correlation between the experts and IP, after excluding discordant results between them. This calibration process was essential, since it allowed the efficient detection of both epithelium and immunostained areas. Immunostaining was then quantified by using transmittance computing within hexagon grids. The GMM was then used to reproducibly establish low, medium, or high thresholds within each hexagon.

By comparing the classification obtained with these thresholds to the classification established by the pathologist, we observed four, eight, and seven discordant results in 25 cases for Bim, Mcl-1, and P-ERK, respectively.

With the GMM, we used histograms pooling all the hexagons from all the cases. Interestingly, the values of most of the hexagons in discordant cases were close to the thresholds, explaining why the class could be easily construed as discordant in these cases with the one defined by the pathologist. IP could thus allow variations in interpretation due to subjectivity to be avoided.

Other parameters could also explain these discrepancies. For instance, Mcl-1 expression is difficult to appreciate, particularly because its localization can vary from one case to another. Mcl-1 can be expressed in the cytosol, in the nucleus, or both, as observed in our study and by other groups (25). Whereas IP considers both types of staining, pathologists mainly evaluate cytosolic staining, so this could generate discrepancies.

For P-ERK immunostaining, the situation is more complex. The activation of P-ERK is often strongly correlated to survival signals transmitted by contact between cancer cells and the extracellular matrix and stromal components. While ERK phosphorylation is most frequently observed only in cancer cells, it may also be observed in stromal cells or in both. Moreover, its expression intensity can strongly vary from one area of the tumor to another. Pathologists must thus consider these features and the proportion of cancer cells highly expressing P-ERK when proposing a composite score (a ā€œhighā€ case will be a tumor sample in which more than 50% of cancer cells show a high level of P-ERK). This relatively subjective appreciation could of course be another source of discrepancy between IP and pathologists’ observations.

Another potential source of discordance is that, in the classification of immunostainings as low, medium, or high, there is no appreciation of an eventual intra-tumor heterogeneity, yet this could be decisive in the therapeutic management of patients (26, 27). For example, a patient presenting a low expression of Mcl-1 in 100% of their cancer cells could be globally more sensitive to ABT-737 than another in whom 30% of cancer cells strongly express Mcl-1. Both cases would be classified as ā€œlowā€ by the pathologist. In this setting, IP could easily decipher the respective proportions of low and high areas and include this parameter in a composite score. Therefore, we included three statistical moments (case per case) in the process: mean value, standard deviation, and skewness parameters of the distribution values of transmittance within each hexagon. By doing so, we were able to use PCA to find the best combination of the three parameters. The first factor component was therefore relevant to establish a score linked to each staining. To fit with the pathologist scores, the GMM was also applied to establish thresholds. This approach allowed us to decrease the levels of discrepancy for both Mcl-1 and P-ERK, i.e., taking heterogeneity into account increased the agreement with the pathologist.

In conclusion, this objective, simple, robust, and calibrated method using simple tools and known parameters can be used to quantify and characterize the expression of protein biomarkers within different tumor compartments, using a mathematical definition of thresholds and taking into account the intra-tumoral heterogeneity of staining. It is replicable and could be used in other biological or medical settings after further validation. It is non-subjective, uses a quality control of proven interest, has a fully automated choice of thresholds, and has defined composite scores, thus allowing the intra-tumor heterogeneity of immunostaining to be taken into account. This fully automated approach could help to avoid the misclassification of patients and the subsequent negative impact on their therapeutic care. A development for the future would be to analyze the PCA score by deep learning in connection with medical data to help pathologists establish the right diagnosis.

Funding

P-MM was supported by the Cancer Institut ThĆ©matique Multi-Organisme of the French National Alliance for Life and Health Sciences (AVIESAN) Plan Cancer 2014-2019 (doctoral grant). This work is part of the ā€œONCOTHERAā€ European project, co-funded by the Normandy County Council, the European Union within the framework of the Operational Program ERDF/ESF 2014-2020. It is also funded by Ligue Contre le Cancer, Calvados Committee.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

Ethics statement

The studies involving human participants were reviewed and approved by Consent number: NCT01440504. The patients/participants provided their written informed consent to participate in this study.

Author contributions

NE: Image processing and written of the manuscript. FG: Immunostaining of virtual slides (WSI) and written of the manuscript. CB-F: Senior pathologist. P-MM: Written of the manuscript. P-EB: Stereology and staining evaluation. SD: Immunostaining of virtual slides (WSI). BP: Algorithm development and written of the manuscript. LP: Stereology, staining evaluation and written of the manuscript. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor AL declared a past co-authorship with the author BP.

References

  • 1

    FernĆ”ndez-CarroblesMMBuenoGGarcĆ­a-RojoMGonzĆ”lez-LópezLLópezCDĆ©nizO. Automatic quantification of IHC stain in breast TMA using colour analysis. Comput Med Imaging Graph (2017) 61:14–27. doi:Ā 10.1016/j.compmedimag.2017.06.002

  • 2

    GuiradoRCarcellerHCastillo-GómezECastrénENacherJ. Automated analysis of images for molecular quantification in immunohistochemistry. Heliyon (2018) 4:e00669. doi: 10.1016/j.heliyon.2018.e00669

  • 3

    LiZDabbsDJ. Avoiding ā€œFalse positiveā€ and ā€œFalse negativeā€ immunohistochemical results in breast pathology. PAT (2022) 89:311–25. doi:Ā 10.1159/000521682

  • 4

    ElliottKMcQuaidSSalto-TellezMMaxwellP. Immunohistochemistry should undergo robust validation equivalent to that of molecular diagnostics. J Clin Pathol (2015) 68:766–70. doi:Ā 10.1136/jclinpath-2015-203178

  • 5

    AllisonKHHammondMEHDowsettMMcKerninSECareyLAFitzgibbonsPLet al. Estrogen and progesterone receptor testing in breast cancer: ASCO/CAP guideline update. JCO (2020) 38:1346–66. doi:Ā 10.1200/JCO.19.02309

  • 6

    HammondMEHHayesDFDowsettMAllredDCHagertyKLBadveSet al. Pathologists’ guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. Breast Care (Basel) (2010) 5:185–7. doi:Ā 10.1159/000315039

  • 7

    ParadisoAMarubiniEVerderioPCorteseMEDe PaolaFSilvestriniRet al. Interobserver reproducibility of immunohistochemical HER-2/neu evaluation in human breast cancer: the real-world experience. Int J Biol Markers (2004) 19:147–54. doi:Ā 10.1177/172460080401900210

  • 8

    VƶrƶsACsƶrgőENyĆ”riTCserniG. An intra- and interobserver reproducibility analysis of the ki-67 proliferation marker assessment on core biopsies of breast cancer patients and its potential clinical implications. Pathobiology (2013) 80:111–8. doi:Ā 10.1159/000343795

  • 9

    BorlotVFBiasoliISchaffelRAzambujaDMilitoCLuizRRet al. Evaluation of intra- and interobserver agreement and its clinical significance for scoring bcl-2 immunohistochemical expression in diffuse large b-cell lymphoma. Pathol Int (2008) 58:596–600. doi:Ā 10.1111/j.1440-1827.2008.02276.x

  • 10

    LeungSCYNielsenTOZabagloLAArunIBadveSSBaneALet al. Analytical validation of a standardised scoring protocol for Ki67 immunohistochemistry on breast cancer excision whole sections: an international multicentre collaboration. Histopathology (2019) 75:225–35. doi:Ā 10.1111/his.13880

  • 11

    PlancoulaineBLaurinavicieneAHerlinPBesusparisJMeskauskasRBaltrusaityteIet al. A methodology for comprehensive breast cancer Ki67 labeling index with intra-tumor heterogeneity appraisal based on hexagonal tiling of digital image analysis data. Virchows Arch (2015) 467:711–22. doi:Ā 10.1007/s00428-015-1865-x

  • 12

    LheureuxSN’DiayeMBlanc-FournierCDuguĆ©AEClarisseBDutoitSet al. Identification of predictive factors of response to the BH3-mimetic molecule ABT-737: an ex vivo experiment in human serous ovarian carcinoma. Int J Cancer (2015) 136:E340–350. doi:Ā 10.1002/ijc.29104

  • 13

    PlancoulaineBPoulainLLaurinaviciusAElieN. Computer-implemented process on an image of a biological sample. European Patent No 3821396. (2019).

  • 14

    XuanGZhangWChaiP. EM algorithms of Gaussian mixture model and hidden Markov model. Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205), Thessaloniki, Greece: IEEE (2001). p. 145–8. doi:Ā 10.1109/ICIP.2001.958974

  • 15

    OhtaY-IKanadeTSakaiT. Color information for region segmentation. Comput Graphics Image Process (1980) 13:222–41. doi:Ā 10.1016/0146-664X(80)90047-7

  • 16

    GaborD. Theory of communication. part 1: The analysis of information. J Institution Electrical Engineers - Part III: Radio Communication Eng (1946) 93:429–57. doi:Ā 10.1049/ji-3-2.1946.0074

  • 17

    BastiaansMJ. The expansion of an optical signal into a discrete set of Gaussian beams. Erzeugung und Analyse von Bildern und Strukturen Informatik-Fachberichte Band 29 81 Jahrestagung der Deutsche Gesellschaft für angewandte Optik Essen Germany (1980) 29:23–32. doi:Ā 10.1007/978-3-642-67687-1_4

  • 18

    DaubechiesI. Ten lectures on wavelets. Philadelphia, Pennsylvania: Society for Industrial and Applied Mathematics (1992). 369 p. doi:Ā 10.1137/1.9781611970104

  • 19

    CosterMChermantJ-L. PrĆ©cis d’analyse d’images. Paris: Presses du CNRS (1989).

  • 20

    BaddeleyAJensenEBV. Stereology for statisticians. Boca Raton, FL: Chapman & Hall/CRC (2005).

  • 21

    JanacekJ. Variance of periodic measure of bounded set with random position. Commentationes Mathematicae Universitatis Carolinae (2006) 47:443–55.

  • 22

    PlancoulaineBLaurinavicieneAMeskauskasRBaltrusaityteIBesusparisJHerlinPet al. Digital immunohistochemistry wizard: image analysis-assisted stereology tool to produce reference data set for calibration and quality control. Diagn Pathol (2014) 9:1–9. doi:Ā 10.1186/1746-1596-9-S1-S8

  • 23

    JolliffeIT. Principal component analysis and factor analysis. In: JolliffeIT, editor. Principal component analysis. New York, NY: Springer (1986). p. 115–28. doi:Ā 10.1007/978-1-4757-1904-8_7

  • 24

    LaurinaviciusALaurinavicieneADaseviciusDElieNPlancoulaineBBorCet al. Digital image analysis in pathology: benefits and obligation. Anal Cell Pathol (Amst) (2012) 35:75–8. doi:Ā 10.3233/ACP-2011-0033

  • 25

    PawlikowskaPLerayIde LavalBGuihardSKumarRRosselliFet al. ATM-Dependent expression of IEX-1 controls nuclear accumulation of mcl-1 and the DNA damage response. Cell Death Differ (2010) 17:1739–50. doi:Ā 10.1038/cdd.2010.56

  • 26

    FumagalliCBarberisM. Breast cancer heterogeneity. Diagnostics (Basel) (2021) 11:1555. doi:Ā 10.3390/diagnostics11091555

  • 27

    MolinariCMarisiGPassardiAMatteucciLDe MaioGUliviP. Heterogeneity in colorectal cancer: A challenge for personalized medicine? Int J Mol Sci (2018) 19:E3733. doi:Ā 10.3390/ijms19123733

Summary

Keywords

image processing, whole slide image, stereology, quality control, immunostaining evaluation

Citation

Elie N, Giffard F, Blanc-Fournier C, Morice P-M, Brachet P-E, Dutoit S, Plancoulaine B and Poulain L (2022) Impact of automated methods for quantitative evaluation of immunostaining: Towards digital pathology. Front. Oncol. 12:931035. doi: 10.3389/fonc.2022.931035

Received

28 April 2022

Accepted

20 September 2022

Published

11 October 2022

Volume

12 - 2022

Edited by

Arvydas Laurinavicius, Vilnius University, Lithuania

Reviewed by

Anca Maria Cimpean, Victor Babes University of Medicine and Pharmacy, Romania; Derek Allison, University of Kentucky, United States

Updates

Copyright

*Correspondence: BenoƮt Plancoulaine, ; Laurent Poulain,

†ORCID: Laurent Poulain, orcid.org/0000-0003-2241-3466

This article was submitted to Molecular and Cellular Oncology, a section of the journal Frontiers in Oncology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics