EBHI-Seg: A novel enteroscope biopsy histopathological hematoxylin and eosin image dataset for image segmentation tasks

Shi, Liyu; Li, Xiaoyan; Hu, Weiming; Chen, Haoyuan; Chen, Jing; Fan, Zizhen; Gao, Minghe; Jing, Yujie; Lu, Guotao; Ma, Deguo; Ma, Zhiyu; Meng, Qingtao; Tang, Dechao; Sun, Hongzan; Grzegorzek, Marcin; Qi, Shouliang; Teng, Yueyang; Li, Chen

doi:10.3389/fmed.2023.1114673

ORIGINAL RESEARCH article

Front. Med. , 24 January 2023

Sec. Pathology

Volume 10 - 2023 | https://doi.org/10.3389/fmed.2023.1114673

This article is part of the Research Topic Computational Pathology for Precision Diagnosis, Treatment, and Prognosis of Cancer View all 10 articles

EBHI-Seg: A novel enteroscope biopsy histopathological hematoxylin and eosin image dataset for image segmentation tasks

$\nLiyu Shi$ Liyu Shi¹

Xiaoyan Li²^*

Weiming Hu¹

Haoyuan Chen¹

Jing Chen¹

Zizhen Fan¹

Minghe Gao¹

Yujie Jing¹

Guotao Lu¹

Deguo Ma¹

Zhiyu Ma¹

Qingtao Meng¹

Dechao Tang¹

Hongzan Sun³

Marcin Grzegorzek^4,5

Shouliang Qi¹

Yueyang Teng¹

Chen Li¹^*

¹Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
²Department of Pathology, Cancer Hospital of China Medical University, Liaoning Cancer Hospital and Institute, Shengyang, China
³Shengjing Hospital, China Medical University, Shenyang, China
⁴Institute of Medical Informatics, University of Lübeck, Lübeck, Germany
⁵Department of Knowledge Engineering, University of Economics in Katowice, Katowice, Poland

Background and purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of colorectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis.

Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods.

Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965.

Conclusion: This publicly available dataset contained 4,456 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients. EBHI-Seg is publicly available at: https://figshare.com/articles/dataset/EBHI-SEG/21540159/1.

1. Introduction

Colon cancer is a common deadly malignant tumor, the fourth most common cancer in men, and the third most common cancer in women worldwide. Colon cancer is responsible for 10% of all cancer cases (1). According to prior research, colon and rectal tumors share many of the same or similar characteristics. Hence, they are often classified collectively (2). The present study categorized rectal and colon cancers into one colorectal cancer category (3). Histopathological examination of the intestinal tract is both the gold standard for the diagnosis of colorectal cancer and a prerequisite for disease treatment (4).

The advantage of using the intestinal biopsy method to remove a part of the intestinal tissue for histopathological analysis, which is used to determine the true status of the patient, is that it considerably reduces damage to the body and rapid wound healing (5). The histopathology sample is then sectioned and processed with Hematoxylin and Eosin (H&E). Treatment with H&E is a common approach when staining tissue sections to show the inclusions between the nucleus and cytoplasm and highlight the fine structures between tissues (6, 7). When a pathologist performs an examination of the colon, they first examine the histopathological sections for eligibility and find the location of the lesion. The pathology sections are then examined and diagnosed using a low magnification microscope. If finer structures need to be observed, the microscope is adjusted to use high magnification for further analysis. However, the following problems usually exist in the diagnostic process: the diagnostic results become more subjective and varied due to different doctors reasons; doctors can easily overlook some information in the presence of a large amount of test data; it is difficult to analyze large amounts of previously collected data (8). Therefore, it is a necessary to address these issues effectively.

With the development and popularization of computer-aided diagnosis (CAD), the pathological sections of each case can be accurately and efficiently examined with the help of computers (9). Now, CAD is widely used in many biomedical image analysis tasks, such as microorganism image analysis (10–18), COVID-19 image analysis (19), histopatholgical image analysis (20–27), cytopathological image analysis (28–31) and sperm video analysis (32, 33). Therefore, the application of computer vision technology for colorectal cancer CAD provides a new direction in this research field (34).

One of the fundamental tasks of CAD is the aspect of image segmentation, the results of which can be used as key evidence in the pathologists' diagnostic processes. Along with the rapid development of medical image segmentation methodology, there is a wide demand for its application to identify benign and malignant tumors, tumor differentiation stages, and other related fields (35). Therefore, a multi-class image segmentation method is needed to obtain high segmentation accuracy and good robustness (36).

The present study presents a novel Enteroscope Biopsy Histopathological H&E Image Dataset for Image Segmentation Tasks (EBHI-Seg), which contains 4456 electron microscopic images of histopathological colorectal cancer sections that encompass six tumor differentiation stages: normal, polyp, low-grade intraepithelial neoplasia, high-grade intraepithelial neoplasia, serrated adenoma, and adenocarcinoma. The segmentation coefficients and evaluation metrics are obtained by segmenting the images of this dataset using different classical machine learning methods and novel deep learning methods.

2. Related work

The present study analyzed and compared the existing colorectal cancer biopsy dataset and provided an in-depth exploration of the currently known research findings. The limitations of the presently available colorectal cancer dataset were also pointed out.

The following conclusions were obtained in the course of the study. For existing datasets, the data types can be grouped into two major categories: Multi and Dual Categorization datasets. Multi Categorization datasets contain tissue types at all stages from Normal to Neoplastic. In Trivizakis et al. (37), a dataset called “Collection of textures in colorectal cancer histology” is described. It includes 5,000 patches of size 74 × 74 μm and contains seven categories. However, because there were only 10 images, it is too small for a data sample and lacked generalization capability. In Chen et al. (23), a dataset called “NCT-CRC-HE-100K” is proposed. This is a set of 100,000 non-overlapping image patches of histological human colorectal cancer (CRC) and normal tissue samples stained with (H&E) that was presented by the National Center for Tumor Diseases (NCT). These image patches are from nine different tissues with an image size of 224 × 224 pixels. The nine tissue categories are adipose, background, debris, lymphocytes, mucus, smooth muscle, normal colon mucosa, cancer-associated stroma, and colorectal adenocarcinoma epithelium. This dataset is publicly available and commonly used. However, because the image sizes are all 224 × 224 pixels, the dataset underperformed in some global details that need to be observed in individual categories. Two datasets are utilized in Oliveira et al. (38): one containing colonic H&E-stained biopsy sections (CRC dataset) and the other consisting of prostate cancer H&E-stained biopsy sections (PCa dataset). The CRC dataset contains 1,133 colorectal biopsy and polypectomy slides grouped into three categories and labeled as non-neoplastic, low-grade and high-grade lesions. In Kausar et al. (39), a dataset named “MICCAI 2016 gland segmentation challenge dataset (GlaS)” is used. This dataset contained 165 microscopic images of H&E-stained colon glandular tissue samples, including 85 training and 80 test datasets. Each dataset is grouped into two parts: benign and malignant tumors. The image size is 775 × 522 pixels. Since this dataset has only two types of data and the number of data is too little, so that it performs poorly on some multi-type training.

Dual Categorization datasets usually contain only two types of tissue types: Normal and Neoplastic. In Wei et al. (40), a dataset named “FFPE” is proposed. This dataset obtained its images by extracting 328 Formalin-fixed Paraffin-embedded (FFPE) whole-slide images of colorectal polyps classified into two categories of : hyperplastic polyps (HPs) and sessile serrated adenomas (SSAs). This dataset contained 3,125 images with an image size of 224 × 224 pixels and is small in type and number. In Bilal et al. (41), two datasets named “UHCW” and “TCGA” are proposed. The first dataset is a colorectal cancer biopsy sequence developed at the University Hospital of Coventry and Warwickshire (UHCW) for internal validation of the rectal biopsy trial. The second dataset is the Cancer Genome Atlas (TCGA) for external validation of the trial. This dataset is commonly used as a publicly available cancer dataset and stores genomic data for more than 20 types of cancers. The two dataset types are grouped into two categories: Normal and Neoplastic. The first dataset contains 4,292 slices, and the second dataset contained 731 slices with an image size of 224 × 224 pixels.

All of the information for the existing datasets is summarized in Table 1. The issues associated with the dataset mentioned above included fewer data types, small amount of data, inaccurate dataset ground truth, etc. The current study required an open-source multi-type colonoscopy biopsy image dataset.

TABLE 1

Table 1. A dataset for the pathological classification of colorectal cancer.

3. Basic information for EBHI-Seg

3.1. Dataset overview

The dataset in the present study contained 4,456 histopathology images, including 2,228 histopathology section images and 2,228 ground truth images. These include normal (76 images and 76 ground truth images), polyp (474 images and 474 ground truth images), low-grade intraepithelial neoplasia (639 images and 639 ground truth images), high-grade intraepithelial neoplasia (186 images and 186 ground truth images), serrated adenoma (58 images and 58 ground truth images), and adenocarcinoma (795 images and 795 ground truth images). The basic information for the dataset is described in detail below. EBHI-Seg is publicly available at: https://figshare.com/articles/dataset/EBHI-SEG/21540159/1.

In the present paper, H&E-treated histopathological sections of colon tissues are used as data for evaluating image segmentation. The dataset is obtained from two histopathologists at the Cancer Hospital of China Medical University [proved by “Research Project Ethics Certification” (No. 202229)]. It is prepared by 12 biomedical researchers according to the following rules: Firstly, if there is only one differentiation stage in the image and the rest of the image is intact, then the differentiation stage became the image label; Secondly, if there is more than one differentiation stage in the image, then the most obvious differentiation is selected as the image label; In general, the most severe and prominent differentiation in the image was used as the image label.

Intestinal biopsy was used as the sampling method in this dataset. The magnification of the data slices is 400×, with an eyepiece magnification of 10× and an objective magnification of 40×. A Nissan Olympus microscope and NewUsbCamera acquisition software are used. The image input size is 224 × 224 pixels, and the format is *.png. The data are grouped into five types described in detail in Section 2.2.

3.2. Data type description

3.2.1. Normal

Colorectal tissue sections of the standard category are made-up of consistently ordered tubular structures and that does not appear infected when viewed under a light microscope (42). Section images with the corresponding ground truth images are shown in Figure 1A.

FIGURE 1

Figure 1. An example of histopathological images database. (A) Normal and ground truth, (B) Polyp and ground truth, (C) High-grade Intraepithelial Neoplasia and ground truth, (D) Low-grade Intraepithelial Neoplasia and ground truth, (E) Adenocarcinoma and ground truth, and (F) Serrated adenoma and ground truth.

3.2.2. Polyp

Colorectal polyps are similar in shape to the structures in the normal category, but have a completely different histological structure. A polyp is a redundant mass that grows on the surface of the body's cells. Modern medicine usually refers to polyps as unwanted growths on the mucosal surface of the body (43). The pathological section of the polyp category also has an intact luminal structure with essentially no nuclear division of the cells. Only the atomic mass is slightly higher than that in the normal category. The polyp category and corresponding ground truth images are shown in Figure 1B.

3.2.3. Intraepithelial neoplasia

Intraepithelial neoplasia (IN) is the most critical precancerous lesion. Compared to the normal category, its histological images show increased branching of adenoid structures, dense arrangement, and different luminal sizes and shapes. In terms of cellular morphology, the nuclei are enlarged and vary in size, while nuclear division increases (44). The standard Padova classification currently classifies intraepithelial neoplasia into low-grade and high-grade INs. High-grade IN demonstrate more pronounced structural changes in the lumen and nuclear enlargement compared to low-grade IN. The images and ground truth diagrams of high-grade and low-grade INs are shown in Figures 1C, D.

3.2.4. Adenocarcinoma

Adenocarcinoma is a malignant digestive tract tumor with a very irregular distribution of luminal structures. It is difficult to identify its border structures during observation, and the nuclei are significantly enlarged at this stage (45). An adenocarcinoma with its corresponding ground truth diagram is shown in Figure 1E.

3.2.5. Serrated adenoma

Serrated adenomas are uncommon lesions, accounting for 1% of all colonic polyps (46). The endoscopic surface appearance of serrated adenomas is not well characterized but is thought to be similar to that of colonic adenomas with tubular or cerebral crypt openings (47). The image of a serrated adenoma with a corresponding ground truth diagram is shown in Figure 1F.

4. Evaluation of EBHI-Seg

4.1. Image segmentation evaluation metric

Six evaluation metrics are commonly used for image segmentation tasks. The Dice ratio metric is a standard metric used in medical images that is often utilized to evaluate the performance of image segmentation algorithms. It is a validation method based on spatial overlap statistics that measures the similarities between the algorithm segmentation output and ground truth (48). The Dice ratio is defined in Equation (1).

\begin{array}{l} DiceRatio = \frac{2 | X \cap Y |}{| X | + | Y |} . & (1) \end{array}

In Equation (1), for a segmentation task, X and Y denote the ground truth and segmentation mask prediction, respectively. The range of the calculated results is [0,1], and the larger the result the better.

The Jaccard index is a classical set similarity measure with many practical applications in image segmentation. The Jaccard index measures the similarity of a finite set of samples: the ratio between the intersection and concatenation of the segmentation results and ground truth (49). The Jaccard index is defined in Equation (2).

\begin{array}{l} JaccardIndex = \frac{| X \cap Y |}{| X \cup Y |} . & (2) \end{array}

The range of the calculated results is [0,1], and the larger the result the better.

Recall and precision are the recall and precision rates, respectively. The range of the calculated results is [0,1]. A higher output indicates a better segmentation result. Recall and precision are defined in Equations (3), (4),

\begin{array}{l} Precison = \frac{TP}{TP + FP}, & (3) \end{array}

\begin{array}{l} Recall = \frac{TP}{TP + FN}, & (4) \end{array}

where TP, FP, TN, and FN are defined in Table 2.

TABLE 2

Table 2. Confusion matrix.

The conformity coefficient (Confm Index) is a consistency coefficient, which is calculated by putting the binary classification result of each pixel from [−∞,1] into continuous interval [−∞,1] to calculate the ratio of the number of incorrectly segmented pixels to the number of correctly segmented pixels to measure the consistency between the segmentation result and ground truth. The conformity coefficient is defined in Equations (5), (6),

\begin{array}{l} ConfmIndex = (1 - \frac{θ_{AE}}{θ_{TP}}), θ_{TP} > 0, & (5) \end{array}

\begin{array}{l} ConfmIndex = Failure, θ_{TP} = 0, & (6) \end{array}

Where θ_AE= θ_FP+θ_FN represents all errors of the fuzzy segmentation results. θ_TP is the number of correctly classified pixels. Mathematically, ConfmIndex can be negative infinity if θ_TP=0. Such a segmentation result is definitely inadequate and treated as failure without the need of any further analysis.

4.2. Classical machine learning methods

Image segmentation is one of the most commonly used methods for classifying image pixels in decision-oriented applications (50). It groups an image into regions high in pixel similarity within each area and has a significant contrast between different regions (51). Machine learning methods for segmentation distinguish the image classes using image features. (1) k-means algorithm is a classical division-based clustering algorithm, where image segmentation means segmenting the image into many disjointed regions. The essence is the clustering process of pixels, and the k-means method is one of the simplest clustering methods (52). Image segmentation of the present study dataset is performed using the classical machine learning method described above. (2) Markov random field (MRF) is a powerful stochastic tool that models the joint probability distribution of an image based on its local spatial action (53). It can extract the texture features of the image and model the image segmentation problem. (3) OTSU algorithm is a global adaptive binarized threshold segmentation algorithm that uses the maximum inter-class variance between the image background and the target image as the selection criterion (54). The image is grouped into foreground and background parts based on its grayscale characteristics independent of the brightness and contrast. (4) Watershed algorithm is a region-based segmentation method, that takes the similarity between neighboring pixels as a reference and connects those pixels with similar spatial locations and grayscale values into a closed contour to achieve the segmentation effect (55). (5) Sobel algorithm has two operators, where one detects horizontal edges and the other detects vertical flat edges. An image is the final result of its operation. Sobel edge detection operator is a set of directional operators that can be used to perform edge detection from different directions (56). The segmentation results are shown in Figure 2.

FIGURE 2

Figure 2. Five types of data segmentation results obtained by different classical machine learning methods.

The performance of EBHI-Seg for different machine learning methods is observed by comparing the images segmented using classical machine learning methods with the corresponding ground truth. The segmentation evaluation metrics results are shown in Table 3. The Dice ratio algorithm is a similarity measure, usually used to compare the similarity of two samples. The value of one for this metric is c onsidered to indicate the best effect, while the value of the worst impact is zero. The Table 3 shows that k-means has a good Dice ratio algorithm value of up to 0.650 in each category. The MRF and Sobel segmentation results also achieved a good Dice ratio algorithm value of around 0.6. In terms of image precision and recall segmentation coefficients, k-means is maintained at approximately 0.650 in each category. In the classical machine learning methods, k-means has the best segmentation results, followed by MRF and Sobel. OTSU has a general effect, while the watershed algorithm has various coefficients that are much lower than those in the above methods. Moreover, there are apparent differences in the segmentation results when using the above methods.

TABLE 3

Table 3. Evaluation metrics for five different segmentation methods based on classical machine learning.

In summary, EBHI-Seg has significantly different results when using different classical machine learning segmentation methods. Different classical machine learning methods have an obvious differentiation according to the image segmentation evaluation metrics. Therefore, EBHI-Seg can effectively evaluate the segmentation performance of different segmentation methods.

4.3. Deep learning methods

Besides the classical macine learning metheds tested above, some popular deep learning methods are also tested. (1) Seg-Net is an open source project for image segmentation (57). The network is identical to the convolutional layer of VGG-16, with the removal of the fully-connected hierarchy and the addition of max-pooling indices resulting in improved boundary delineation. Seg-Net performs better in large datasets. (2) U-Net network structure was first proposed in 2015 (58) for medical imaging. U-Net is lightweight, and its simultaneous detection of local and global information is helpful for both information extraction and diagnostic results from clinical medical images. (3) MedT is a network published in 2021, which is a transformer structure that applies an attention mechanism based on medical image segmentation (59). The segmentation results are shown in Figure 3.

FIGURE 3

Figure 3. Three types of data segmentation results obtained by different deep learning methods.

The segmentation effect is test on the present dataset using three deep learning models. In the experiments, each model is trained using the ratio of the training set, validation set, and test set of 4:4:2. All of the information for the existing datasets is summarized in Table 4. The model learning rate is set to 3e−6, epochs are set to 100, and batch-size is set to 1. The optimizer is Adam, the loss function is crossentropyloss and the activation function is ReLU. The dataset segmentation results of using three different models are shown in Figure 3. The experimental segmentation evaluation metrics are shown in Table 5. Overall, deep learning performs much better than classical machine learning methods. Among them, the evaluation indexes of the training results using the U-Net and Seg-Net models can reach 0.90 on average. The evaluation results of the MedT model are slightly worse at a level, between 0.70 and 0.80. The training time is longer for MedT and similar for U-Net and Seg-Net.

TABLE 4

Table 4. Deep learning of the number of different types of training images.

TABLE 5

Table 5. Evaluation metrics for three different segmentation methods based on deep learning.

Based on the above results, EBHI-Seg achieved a clear differentiation using deep learning image segmentation methods. Image segmentation metrics for different deep learning methods are significantly different so that EBHI-Seg can evaluate their segmentation performance.

4.4. Experimental environment

This section presents the hardware configuration data required for this experiment as well as the software version.

Processor: Intel Core i7-8700 @ 3.20GHz Six Core

Graphics (GPU): NVIDIA GeForce RTX 2080

Graphics (CPU): Intel UHD Graphics 630

Hard Drive: SM961 NVMe SAMSUNG 512GB (Solid State Drive)

Motherboard: Dell 0NNNCT (C246 chipset)

Mainframe: Dell Precision 3630 Tower Desktop Mainframe

Software Versions: CUDA 11.2, torch 1.7.0, torchvision 0.8.0, python 3.8.

5. Discussion

5.1. Discussion of image segmentation results using classical machine learning methods

Six types of tumor differentiation stage data in EBHI-Seg were analyzed using classical machine learning methods to obtain the results in Table 3. Base on the Dice ratio metrics, k-means, MRF and Sobel show no significant differences among the three methods around 0.55. In contrast, Watershed metrics are ~0.45 on average, which is lower than the above three metrics. OTSU index is around ~0.40 because the foreground-background is blurred in some experimental samples and OTSU had a difficulty extracting a suitable segmentation threshold, which resulted in undifferentiated test results. Precision and Recall evaluation indexes for k-means, MRF, and Sobel are also around 0.60, which is higher than those for OTSU and Watershed methods by about 0.20. In these three methods, k-means and MRF are higher than Sobel in the visual performance of the images. Although Sobel is the same as these two methods in terms of metrics, it is difficult to distinguish foreground and background images in real images.The segmentation results for MRF are obvious but the running time for MRF is too long in comparison with other classical learning methods. Since classical machine learning methods have a rigorous theoretical foundation and simple ideas, they have been shown to perform well when used for specific problems. However, the performance of different methods varied in the present study.

5.2. Discussion of image segmentation results using deep learning methods

In general, deep learning models are considerably superior to classical machine learning methods, and even the lowest MedT performance is still higher than the highest accuracy of classical machine learning methods. In EBHI-Seg, the Dice ratio evaluation index of MedT reaches ~0.75. However, the MedT model size was larger and as a result the training time was too long. U-Net and Seg-Net have higher evaluation indexes than MedT, both of about 0.88. Among them, Seg-Net has the least training time and the lowest training model size. Because the normal category has fewer sample images than other categories, the evaluation metrics of the three deep learning methods in this category are significantly lower than those in other categories. The evaluation metrics of the three segmentation methods are significantly higher in the other categories, with Seg-Net averaging above 0.90 and MedT exceeding 0.80.

6. Conclusion and future work

The present stduy introduced a publicly available colorectal pathology image dataset containing 4456 magnified 400× pathology images of six types of tumor differentiation stages. EBHI-Seg has high segmentation accuracy as well as good robustness. In the classical machine learning approach, segmentation experiments were performed using different methods and evaluation metrics analysis was carried out utilizing segmentation results. The highest and lowest Dice ratios are 0.65 and 0.30, respectively. The highest Precision and Recall values are 0.70 and 0.90, respectively, while the lowest values are 0.50 and 0.35, respectively. All three models performed well when using the deep learning method, with the highest Dice ratio reaching above 0.95 and both Precision and Recall values reaching above 0.90. The segmentation experiments using EBHI-Seg show that this dataset effectively perform the segmentation task in each of the segmentation methods. Furthermore, there are significant differences among the segmentation evaluation metrics. Therefore, EBHI-Seg is practical and effective in performing image segmentation tasks.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: EBHI-Seg is publicly available at: https://figshare.com/articles/dataset/EBHISEG/21540159/1.

Author contributions

LS: data preparation, experiment, result analysis, and paper writing. XL: data collection and medical knowledge. WH: data collection, data preparation, and paper writing. HC: data preparation and paper writing. JC, ZF, MGa, YJ, GL, DM, ZM, QM, and DT: data preparation. HS: medical knowledge. MGr and YT: result analysis. SQ: method. CL: data collection, method, experiment, result analysis, paper writing, and proofreading. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the National Natural Science Foundation of China (No. 82220108007) and the Beijing Xisike Clinical Oncology Research Foundation (No. Y-tongshu2021/1n-0379).

Acknowledgments

We thank Miss. Zixian Li and Mr. Guoxian Li for their important discussion in this work.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. (2021) 71:209–49. doi: 10.3322/caac.21660

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Lee YC, Lee YL, Chuang JP, Lee JC. Differences in survival between colon and rectal cancer from SEER data. PLoS ONE. (2013) 8:e78709. doi: 10.1371/journal.pone.0078709

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Pamudurthy V, Lodhia N, Konda VJ. Advances in endoscopy for colorectal polyp detection and classification. In: Baylor University Medical Center Proceedings. Vol. 33. Taylor & Francis (2020). p. 28–35. doi: 10.1080/08998280.2019.1686327

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Thijs J, Van Zwet A, Thijs W, Oey H, Karrenbeld A, Stellaard F, et al. Diagnostic tests for Helicobacter pylori: a prospective evaluation of their accuracy, without selecting a single test as the gold standard. Am J Gastroenterol. (1996) 91:10. doi: 10.1016/0016-5085(95)23623-6

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Labianca R, Nordlinger B, Beretta G, Mosconi S, Mandalà M, Cervantes A, et al. Early colon cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. (2013) 24:vi64–vi72. doi: 10.1093/annonc/mdt354

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Fischer AH, Jacobson KA, Rose J, Zeller R. Hematoxylin and eosin staining of tissue and cell sections. Cold Spring Harbor Protocols. (2008) 2008:pdb-prot4986. doi: 10.1101/pdb.prot4986

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Chan JK. The wonderful colors of the hematoxylin-eosin stain in diagnostic surgical pathology. Int J Surg Pathol. (2014) 22:12–32. doi: 10.1177/1066896913517939

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Gupta V, Vasudev M, Doegar A, Sambyal N. Breast cancer detection from histopathology images using modified residual neural networks. Biocybernetics Biomed Eng. (2021) 41:1272–87. doi: 10.1016/j.bbe.2021.08.011

EBHI-Seg: A novel enteroscope biopsy histopathological hematoxylin and eosin image dataset for image segmentation tasks

1. Introduction

2. Related work

3. Basic information for EBHI-Seg

3.1. Dataset overview

3.2. Data type description

3.2.1. Normal

3.2.2. Polyp

3.2.3. Intraepithelial neoplasia

3.2.4. Adenocarcinoma

3.2.5. Serrated adenoma

4. Evaluation of EBHI-Seg

4.1. Image segmentation evaluation metric

4.2. Classical machine learning methods

4.3. Deep learning methods

4.4. Experimental environment

5. Discussion

5.1. Discussion of image segmentation results using classical machine learning methods

5.2. Discussion of image segmentation results using deep learning methods

6. Conclusion and future work

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher's note

References

95% of researchers rate our articles as excellent or good