RC-Net: Regression Correction for End-To-End Chromosome Instance Segmentation

Liu, Hui; Wang, Guangjie; Song, Sifan; Huang, Daiyun; Zhang, Lin

doi:10.3389/fgene.2022.895099

BRIEF RESEARCH REPORT article

Front. Genet., 18 May 2022

Sec. Computational Genomics

Volume 13 - 2022 | https://doi.org/10.3389/fgene.2022.895099

This article is part of the Research TopicApplication of Deep Learning Techniques in 3D Biomedical and Clinical ImagesView all 9 articles

RC-Net: Regression Correction for End-To-End Chromosome Instance Segmentation

Hui Liu^1,2

Guangjie Wang¹

Sifan Song³

Daiyun Huang³

Lin Zhang^1,2*

¹School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
²Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China
³Department of Biological Sciences, AI University Research Center, Xi’an Jiaotong-Liverpool University, Suzhou, China

Precise segmentation of chromosome in the real image achieved by a microscope is significant for karyotype analysis. The segmentation of image is usually achieved by a pixel-level classification task, which considers different instances as different classes. Many instance segmentation methods predict the Intersection over Union (IoU) through the head branch to correct the classification confidence. Their effectiveness is based on the correlation between branch tasks. However, none of these methods consider the correlation between input and output in branch tasks. Herein, we propose a chromosome instance segmentation network based on regression correction. First, we adopt two head branches to predict two confidences that are more related to localization accuracy and segmentation accuracy to correct the classification confidence, which reduce the omission of predicted boxes in NMS. Furthermore, a NMS algorithm is further designed to screen the target segmentation mask with the IoU of the overlapping instance, which reduces the omission of predicted masks in NMS. Moreover, given the fact that the original IoU loss function is not sensitive to the wrong segmentation, K-IoU loss function is defined to strengthen the penalty of the wrong segmentation, which rationalizes the loss of mis-segmentation and effectively prevents wrong segmentation. Finally, an ablation experiment is designed to evaluate the effectiveness of the chromosome instance segmentation network based on regression correction, which shows that our proposed method can effectively enhance the performance in automatic chromosome segmentation tasks and provide a guarantee for end-to-end karyotype analysis.

Introduction

Motivation

Chromosomes are essential carriers for genetic information, and their abnormalities may result in congenital genetic diseases (Schrock et al., 1997). Healthy human cells contain 46 chromosomes, including 22 pairs of autosomes and 1 pair of sex chromosomes (two X sex chromosomes for women and one X and one Y chromosome for men) (Tjio, 1956; T. Arora and Dhir, 2016). Chromosome karyotype analysis, as shown in Supplementary Figure S1, can be achieved mainly by cell culture, shooting and imaging, image segmentation followed by chromosome identification (Altinordu et al., 2016). Thus, the karyotype analysis has become a common and significant method for prenatal diagnosis, genetic disease diagnosis, and screening (Garimberti and Tosi, 2010; Jahani et al., 2011; Abid and Hamami, 2018). Furthermore, the accuracy of chromosome image segmentation directly determines the accuracy of subsequent chromosome classification and abnormality identification, which makes segmentation the primary task of the karyotype analysis (Wang, et al., 2021). However, as a flexible substance (Almagro, et al., 2003), even chromosomes with the same number will show different curved shapes in different photos, and clustering will occur due to the contact and overlap of chromosomes (Somasundaram, 2019). At present, the segmentation of overlapping chromosomes is mainly done manually by cytologists, which relies heavily on the operator’s experience. Thus, it is time-consuming, labor-intensive, and error-prone. Thus, how to automatically and effectively segment a single chromosome and improve segmentation accuracy has become a critical topic in karyotype analysis (Sharma et al., 2017).

Related Work

Traditional automatic chromosome segmentation methods are mainly based on geometric morphology (Somasundaram and Nirmala, 2010; Balaji, 2012; Sreejini et al., 2012; Balaji and Vidhya, 2015; Nair et al., 2015; Pravina, 2015; Vijayan et al., 2015; Li et al., 2016) and threshold (Ji, 1994; M.F.S. Andrade, et al., 2018; Ji, 1989). The segmentation of overlapping chromosomes is achieved by extracting features such as pits, tangent points, and refined skeletons of overlapping chromosomes. Somasundaram et al. (2014) first used the multi-object geodesic contour method to separate individual chromosome. For overlapping chromosomes, the curvature function was first used to identify the cutting points on the image. Then, the obtained cutting points were used to draw hypothetical lines on the overlapping areas. Finally, the non-overlapping chromosomes were segmented. Yilmaz et al. proposed a method of thresholding and watershed segmentation to separate chromosome clusters, calculate the tangent points of the chromosome clusters through the curvature function, and segment the overlapping chromosomes through the optimal geodesic path between the tangent points (Yang and Kruggel, 2008). Minaee et al. (2014) first extracted the outlines of overlapping chromosomes. They then applied VAMD (Variations in the Angle of Motion Direction) and SDTP (Sum of Distances among Total Points) to extract the tangent points. The segmentation effect for completely overlapping chromosome clusters is poor. This type of method determines the intersection and concave point of the overlapping part of the chromosome by calculating the curvature and then performs segmentation. Therefore, the misjudgment and omission of the effective intersection point will seriously affect the performance of the segmentation.

Recently, more researches have constructed deep learning methods to accomplish medical image processing tasks, which can effectively avoid the occurrence of the aforementioned issues. Similar to natural image segmentation, chromosome segmentation methods based on deep learning are mainly divided into semantic segmentation (Shelhamer, et al., 2017) and instance segmentation (Fathi et al., 2017). As for chromosome semantic segmentation tasks, Hu et al. constructed the U-Net with two-layer pooling to segment overlapping chromosomes with less computation and storage costs (Hu, et al., 2017). The segmentation accuracy and Intersection over Union (IoU) score (McGuinness and O’Connor, 2010) for overlapping regions are 99.22 and 94.70, respectively, where the segmentation accuracy is high, but the IoU score still needs to be improved. Saleh et al. believed that the increase of pooling and convolution operation in the network was conducive to the extraction of more input feature information (Saleh, et al., 2019). Thus, they built three-layer pooling in U-Net (Ronneberger et al., 2015) to segment overlapping chromosomes, and the segmentation accuracy and IoU were slightly improved. However, the aforementioned two methods are only applicable to scenarios where chromosomes overlap in pairs. However, real chromosome overlapping is much more complicate than that. Thus, it is not that sufficient to apply the aforementioned two methods to real chromosome data sets. As for the chromosome instance segmentation tasks, Bai et al. first used U-Net to segment the foreground in the chromosome image, and then YOLO v3 (Joseph Redmon, 2018) was constructed to obtain the target detection box of each chromosome, which is followed by U-Net to segment single chromosomes from the detection boxes in the final (Bai, et al., 2020). The YOLO v3 backbone network used in this method is weak in detecting small targets and overlapping targets, so it does not work well in the scenarios that chromosomes overlap with each other severely. In addition, it disassembles the instance segmentation task into three networks, which makes the procedure cumbersome and inefficient.

It can be seen that the accuracy of the target detection box is extremely important in the chromosome instance segmentation tasks. Generally, when detecting clustered targets, the classification confidence of the target box is often high, but the actual detection result is poor, which leads to a decrease in the AP score with high IoU threshold. To address this issue, Jiang et al. constructed IoU Net, which predicts the IoU of the regression box and the ground truth box to replace original classification confidence, which eliminated the screening error caused by the misleading classification confidence, thus improved the target detection performance (Jiang et al., 2018). Wu et al. constructed the IoU-aware single-stage object detector. It also predicts the IoU of the regression box and the ground truth box and then uses it as a multiplicative operator to correct the classification confidence (Wu et al., 2020). The corrected confidence is better correlated with the positioning accuracy, which effectively improves the positioning accuracy. Chen et al. constructed the supervised edge attention network (SEA Net) (Chen et al., 2020). The IoU of the regression box and the ground truth box are achieved and multiplied with the classification confidence to improve the detection accuracy of the clustered target. Moreover, they designed an extra head branch to help predict the edge of mask to improve the segmentation effect when the IoU threshold is high. For instance, segmentation tasks where the classification confidence is high while the actual segmentation result is not that satisfactory, Huang et al. multiplies the IoU of the predicted mask and the ground truth mask with the classification confidence to construct the Mask Scoring RCNN (MS RCNN) (Huang, et al., 2019). It considers the classification score and the quality score of the predicted mask, and the segmentation result is further improved compared with Mask RCNN. The methods mentioned before adopted either the IoU of the predicted box or the IoU of the predicted mask, and the ground truth box to modify the classification confidence. However, it does not consider whether the prediction process is interpretable. If an interpretable method is adopted, the performance will be better.

Contribution

This study proposes a chromosome instance segmentation network based on regression correction to achieve precise segmentation in the Giemsa-banding chromosome images. The main contributions of this study are summarized as follows.

1) Considering high classification confidence but poor detection and segmentation performance in reality, more relevant confidence of P_Box and IoU_Mask with positioning accuracy and segmentation accuracy are achieved without extra head branches to achieve better correction of the classification confidence. P_Box is the predicted probability based on the regression box, and IoU_Mask is the predicted IoU based on the mask.

2) Considering that the traditional non-maximum suppression algorithms based on the overlap screening of prediction boxes, which may result in missing or wrong target boxes, a non-maximum suppression algorithm based on instance mask screening is proposed to improve the segmentation of instances.

3) Since the traditional IoU loss function is not sensitive to the wrong segmentation area, K-IoU loss function is designed. It divides the area to be segmented into K parts and calculate the weight of each part to the overall segmentation loss according to the proportion of the area to be segment in each part to the total area, which improves the sensitivity of the network to error segmentation and makes the penalty reasonable.

Methods

Instance Segmentation Model Based on Regression Correction

The multitask supervised learning method is known to make good use of valuable information to obtain more accurate results for each task. Its effectiveness lies in the correlation between all tasks. However, the predicted result of the regression branch is the offset of the regression box rather than the actual coordinates. There is no direct correlation between the offset and the IoU score, which makes it not reasonable enough. In addition, the use of IoU score to modify the classification confidence will cause the drop of classification confidence, thus worsen the subsequent non-maximum suppression operations. Therefore, Wu et al. and Chen et al. proposed regression branches to predict IoU scores under a multitask supervised learning framework, but the results showed low correlation with the real IoU scores. To address this issue, we propose here a regression correction-based instance segmentation network for chromosome segmentation, as shown in Figure 1.

FIGURE 1

FIGURE 1. Structure of the regression correction network.

First, a regression confidence P_Box is introduced, as shown in Eq. 1. Taking the prediction result of the regression branch as input, P_Box is predicted through a fully connected layer with 1,024 output nodes. It helps make the prediction process of P_Box more reasonable, which shows stronger correlation with positioning accuracy.

P_{B ox} = 1 - (T (L_{R e g})) (1)

where T(·) is tanh function, and L_Reg is the regression loss, which is calculated by the Smooth L1 loss function.

Due to the direct correlation between the output of the Mask branch and IoU_Mask, the output of the mask branch acts as input, and the IoU_Mask is predicted by the fully connected layer with 1,024 output nodes, instead of multitask supervised learning, which helps make the prediction process more interpretable, as shown in Figure 1.

Finally, the regression confidence P_Box, as well as IoU_Mask, which is more relevant with the segmentation accuracy, are used to correct the classification confidence. Thus, both the detection score and the segmentation score are considered simultaneously to achieve better instance segmentation performance.

Mask-Based Non-Maximum Suppression Algorithm

For overlapping target detection, the non-maximum suppression algorithm should be further improved due to its poor effect on severe overlapping (Neubeck and Van Gool, 2006). Therefore, Bodla et al. proposed a Soft-NMS algorithm, which weakens the lower confidence of the overlapping detection box by multiplying it by a weight, instead of directly discarding it (Bodla, et al., 2017). The detection performance of overlapping targets is slightly improved, while the time complexity significantly increased. A Box-based non-maximum suppression algorithm is beneficial to target detection tasks. However, the effect is general in the instance segmentation task. As shown in Supplementary Figure S2, both boxes are the prediction boxes of the two chromosomes, respectively, and the IoU of the two boxes is 0.8. Thus, the overlapping is severe. Following conventional processing, boxes with higher classification confidence will be remained, while boxes with lower classification confidence will be discarded, resulting in missing detection of target boxes in this case. However, the analysis found that the IoU of the mask was only 0.2 at this time, which was much lower than the IoU of the detection boxes.

Therefore, a mask-based non-maximum suppression algorithm is proposed here for overlapping chromosome segmentation tasks. The algorithm aims to remain as many prediction boxes as possible before the prediction box fed into the mask branch and then calculates the IoU of each prediction mask and other prediction masks. Finally, traverse the classification confidence from high to low and remove prediction masks that have an IoU score greater than that of the threshold with the current prediction mask. It makes use of the IoU of the mask as a threshold to help select overlapping targets, which can effectively prevent missing and misjudged overlapping targets, thus improve segmentation performance.

K-IoU Loss Function

There are multiple metrics for segmentation performance evaluation. Among them, IoU is the most widely used one, and better segmentation performance expects higher IoU score. Thus, the IoU loss function (Yu, et al., 2016) is often used for model parameter optimization, as shown in Eq. 2.

L_{I o U} = - \ln I o U_{M a s k}, (2)

where IoU_Mask represents the IoU score between the predicted mask and its ground truth.

However, IoU can only represent the overall segmentation quality of the prediction results. It cannot adequately represent the segmentation quality of some key regions. Under chromosome segmentation scenarios that chromosomes exhibit variable shapes, fuzzy edges, and severe overlaps, the difficult-to-segment regions are the key regions that call for more attention. The segmentation quality of key regions may better help karyotypists to diagnose, thus provides more reliable information for physicians’ choice of medical regime. Thus, a more effective and reasonable loss function, L_K-IoU, is proposed for the incorrectly segmented region, as shown in Eq. 3. By minimizing the K-IoU loss function, the network has better segmentation performance for difficult-to-segment regions.

L_{K - I o U} = - \sum_{i = 1}^{K} δ_{i} \ln I o U_{i}, (3)

δ_{i} = \frac{M a s k_{i}}{M a s k}, (4)

where K indicates the number of different parts that the ground truth mask is divided into. As shown in Figure 2, K is 4 and the shape is 2 × 2, the ground truth mask is equally divided by two vertical center lines to obtain four parts. As shown in Eq. 4, δ_i indicates the proportion of the ground truth in the i-part over the entire ground truth, and IoU_i indicates the IoU of the predicted mask and the ground truth in the i-part.

FIGURE 2

FIGURE 2. Comparison diagram of L_IoU and L_K-IoU. (A) Calculation of LIoU. (B) Calculation of L_K-IoU.

As shown in Figure 2, the chromosome is divided into four parts, which are indicated as ①, ②, ③, and ④. The IoU scores and δ_i scores of the four parts are demonstrated, with the striped area being the predicted mask. In Figure 2A, except for the lower IoU score in part ①, the IoU scores of all the other parts are 1. Suppose the conventional IoU loss function is used, the high IoU scores of the other three parts will weaken the negative impact caused by the incorrect segmentation in the first part and reduce the sensitivity of the network to the incorrect segmentation. Finally, the loss of 0.084 can be achieved. In contrast, a loss of 0.51 can be obtained if the L_K-IoU (δ_i = 1) is used. Compared with the IoU loss function, better segmentation performance can be obtained when the loss converges to the same value, and the sensitivity of the network to incorrect segmentation is dramatically improved.

However, it is not necessary to blindly increase the sensitivity of the network to incorrect segmentation. When the proportion of the ground truth mask in a certain part to the entire ground truth mask becomes lower, the influence of this part on the whole is smaller. Comparing Figure 2A with Figure 2B, the segmentation result in (a) is significantly better than that in (b), but their L_K-IoU (δ_i = 1) are the same. It may thank L_K-IoU that corrects the loss of each part through the weight δ_i, as shown in Figure 2B. It is more sensitive to incorrect segmentation and can better highlight the contribution of critical areas to loss.

Then, we define the multitask loss on each proposal as the sum of the losses from Box head and Seg head, as shown in Eq. 5.

L = L_{B o x} + L_{S e g}, (5)

where L_Box is composed of three parts, which are defined in Eq. 6.

L_{B o x} = L_{C l s} + L_{R e g} + L_{P_{B o x}}, (6)

where L_Cls is calculated by the cross-entropy loss function, and $L_{P_{B o x}}$ is calculated by the cross-entropy loss function based on P_Box obtained by Eq. 1.

L_Seg is also composed of three parts:

L_{S e g} = L_{M a s k} + L_{K - I o U} + L_{I o U_{M a s k}}, (7)

where L_Mask is the binarized cross-entropy loss function, and L_K-IoU, calculated by Eq. 3, is also the binarized cross-entropy loss.

Experimental Results

Wein the next conducted five-fold cross-validation experiments on 985 real chromosome Giemsa-banding chromosome images of 1,600 × 1,200 pixels. A total of 60% of the data was allocated for training, while the remaining 40% images were equally partitioned and referred to as validation and test sets. These images were first scaled and padded to 512 × 512 and data augmentation was also involved to better train the models.

Mask RCNN (He, et al., 2017), PANet (Liu, et al., 2018), IoU Net, and MS RCNN with different backbone network were compared on the same dataset. The hyperparameters of the model proposed in this study follow Mask RCNN. The initial learning rate is 1e-5, the learning momentum is 0.9, and the weight decay is 0.0001. Due to the hardware limitations and image size, the batch size is set to 1, and stochastic gradient descent (SGD) is used for training for 100 epochs.

Evaluation Metrics

For the evaluation of target detection, AP^M (Lin et al., 2014) is adopted in this study. AP^M represents the average accuracy value of mask’s IoU threshold from 0.5 to 0.95 with an interval of 0.05. AP^M₅₀ refers to the AP^M score with mask IoU threshold being 0.5, while AP^M₇₅ refers to the score with mask IoU threshold being 0.75.

Main Result

As shown in Table 1, our proposed method achieves stable improvements on different models and backbone networks. With ResNet 101 + FPN, the AP^M of Mask RCNN+RC reaches 83.35%, with an increase of 3.76%. Since PANet follows the hyperparameters of Mask RCNN, the segmentation results of PANet are not as good as Mask RCNN, but when the backbone network is ResNet101 + FPN, the AP^M is still significantly improved with an increase of 2.64%.

TABLE 1

TABLE 1. Performance comparison of different network models.

Discussion

Performance of regression correction network: compared with the baseline Mask RCNN, the chromosome instance segmentation network based on regression correction in this study can significantly improve the accuracy of instance segmentation and enhance the AP^M score by 3.76%, as shown in Table 2. Experimental results show that introducing a mask-based non-maximum suppression algorithm is effective for improving the performance of instance segmentation. As shown in Supplementary Figure S3, the left image presents the segmentation result of the baseline model Mask RCNN, the right one displays the segmentation result of the mask-based non-maximum suppression algorithm assembled on the baseline model, and the weights of the two models are the same. It can be seen that the mask-based non-maximum suppression algorithm effectively prevents the omission of segmentation masks without training.

TABLE 2

TABLE 2. Ablation experiment results.

In the meanwhile, the introduction of the K-IoU loss function helps improve the sensitivity to incorrect segmentation. It not only strengthens the penalty for incorrect segmentation but also considers the proportion of segmentation errors, on the whole, making the penalty more reasonable. Therefore, AP^M is further improved. In this study, the grid search method is used to determine the value of K. As shown in Supplementary Table S1, when K is 4, the AP^M score is the highest, and when the K is further increased, the AP^M score decreases. Therefore, this study sets the value of K to 4. Analyzing the reason, when the shape is refined, the IoU of the prediction mask and the ground truth mask in some grids will be 0, resulting in the back-propagation gradient being 0, and optimization training cannot be performed.

By comparing the method of directly using the output of the Mask branch to predict IoU_Mask (the seventh row) and the method of MS RCNN, both use the predicted IoU_Mask to correct the classification confidence. The segmentation performance of the former is better than that of MS RCNN. This verifies from the side that the method in this study makes the predicted correlation between IoU_Mask and Mask stronger and is more helpful to correct classification confidence.

The IoU Net–based method, which uses IoU_Box instead of classification confidence, is ineffective and even leads to a decrease in AP^M. It is due to the fact that the correlation between the output of the regression branch and IoU_Box is not strong enough. Therefore, this article uses a more relevant head branch to predict the regression confidence P_Box to correct the classification confidence (the eighth row). Compared with IoU Net, AP^M has a more significant improvement, which means that the regression confidence P_Box can modify the positioning accuracy of the prediction box more than IoU_Box. Finally, this study considers both the positioning accuracy of the prediction box and the segmentation accuracy of the instance (the ninth row). The AP^M has been further improved to 83.11 with an increase of 2.73%.

The design of confidence weight: This study considers the positioning accuracy P_Box of the prediction box and segmentation accuracy IoU_Mask of the instance at the same time to improve the segmentation performance of the network. However, multiplying the two directly with the classification confidence may not be the best choice. Therefore, P_Box and IoU_Mask are exponentiated, and the AP^M scores obtained are shown in Supplementary Table S2. When IoU_Mask is “√2” and P_Box is “√,” the specific calculation method is shown in Eq. 8.

P_{C l s} = P_{C l s} \cdot I o U_{M a s k}^{2} \cdot P_{B o x}, (8)

We can see that when P_Box is calculated to the sixth power, AP^M reaches the highest score of 83.35%. Moreover, the improvement is more significant than the effect brought by the exponentiation of IoU_Mask. It can be seen that AP^M is more sensitive to P_Box, further verifying the effectiveness of P_Box.

Conclusion

This article focuses on improving the segmentation accuracy of chromosome instances in real chromosome datasets, significantly overlapping chromosomes. We respectively use the output of the regression branch and the mask branch to predict two confidences, P_Box and IoU_Mask, which are more relevant to the positioning accuracy and segmentation accuracy and achieve a better correction of the classification confidence. A non-maximum suppression algorithm based on mask is proposed, which uses the overlap of the instance as the basis for judgment, which effectively prevents the missing and incorrect segmentation of the chromosomes. Moreover, a K-IoU loss function is proposed, which improves the network’s sensitivity to incorrect segmentation while fully considering the impact of the incorrect segmentation on the whole so that the penalty is reasonable. The experimental results show that the method in this study greatly improves the accuracy of instance segmentation on the baseline Mask RCNN, and it also has a good effect on PANet. Since the implementation of P_Box and IoU_Mask does not require additional head branches and the structure is relatively simple, it is expected to be extended to other models which aim at instance segmentation.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author Contributions

HL and GW built the architecture for RC-Net, designed and implemented the experiments, analyzed the result, and wrote the manuscript. GW conducted the experiments, analyzed the result, and revised the manuscript. LZ and HL supervised the project, analyzed the result, and revised the manuscript. SS and DH manage the data. All authors read, critically revised, and approved the final manuscript.

Funding

This work was supported by the Fundamental Research Funds for Central Universities (Research Projects No. 2019ZDPY15 to LZ).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2022.895099/full#supplementary-material

References

Abid, F., and Hamami, L. (2018). A Survey of Neural Network Based Automated Systems for Human Chromosome Classification. Artif. Intell. Rev. 49, 41–56. doi:10.1007/s10462-016-9515-5