Rethinking Pooling Operation for Liver and Liver-Tumor Segmentations

Lei, Junchao; Lei, Tao; Zhao, Weiqiang; Xue, Mingyuan; Du, Xiaogang; Nandi, Asoke K.

doi:10.3389/frsip.2021.808050

ORIGINAL RESEARCH article

Front. Signal Process., 10 January 2022

Sec. Image Processing

Volume 1 - 2021 | https://doi.org/10.3389/frsip.2021.808050

This article is part of the Research TopicHorizons in Signal ProcessingView all 6 articles

Rethinking Pooling Operation for Liver and Liver-Tumor Segmentations

Junchao Lei^1,2

Tao Lei^1,2*

Weiqiang Zhao³

Mingyuan Xue^1,2

Xiaogang Du^1,2

Asoke K. Nandi^4,5*

¹Shaanxi Joint Laboratory of Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an, China
²The School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an, China
³Unmanned Intelligent Control Division, China Electronics Technology Group Corporation Northwest Group Corporation, Xi’an, China
⁴Electronic and Electrical Engineering, Brunel University London, London, United Kingdom
⁵School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, China

Deep convolutional neural networks (DCNNs) have been widely used in medical image segmentation due to their excellent feature learning ability. In these DCNNs, the pooling operation is usually used for image down-sampling, which can gradually reduce the image resolution and thus expands the receptive field of convolution kernel. Although the pooling operation has the above advantages, it inevitably causes information loss during the down-sampling of the pooling process. This paper proposes an effective weighted pooling operation to address the problem of information loss. First, we set up a pooling window with learnable parameters, and then update these parameters during the training process. Secondly, we use weighted pooling to improve the full-scale skip connection and enhance the multi-scale feature fusion. We evaluated weighted pooling on two public benchmark datasets, the LiTS2017 and the CHAOS. The experimental results show that the proposed weighted pooling operation effectively improve network performance and improve the accuracy of liver and liver-tumor segmentation.

Introduction

Accurate segmentation of livers and liver tumors can assist doctors in better diagnosis and help doctors make better medical plans. Therefore, liver and liver tumor segmentations have always been one of the research hotspots in the field of medical image analysis. However, because the liver has a similar density with other nearby organs, it is difficult to find the liver boundary accurately in a CT image of the abdomen for non-professionals (Li et al., 2015). Relying on manual labeling of liver regions is not only time-consuming and labor-intensive, tedious and inefficient, but also requires high-level professional technical expertise for delineating labelers. Therefore, automatic or semi-automatic liver segmentation algorithms have become a research goal in the field of medical image analysis (Furukawa et al., 2017).

Before the advent of deep learning (LeCun et al., 2015), three popular image segmentation algorithms were often used for liver segmentation: algorithms based on gray values (Adams and Bischof, 1994; ChenyangXu and Prince, 1998; Lei et al., 2018), algorithms based on statistical shape models (Heimann et al., 2006; Zhang et al., 2010; Tomoshige et al., 2014), and algorithms based on texture features (Gambino et al., 2010; Ji et al., 2013). These traditional image segmentation algorithms often employ only image low-level features such as edge, shape, texture, etc., but do not employ the image semantic information with strong representation ability. Thus, they only provide low image segmentation accuracy and show poor generalization. In recent years, with the rapid development of deep learning (Hinton and Salakhutdinov, 2006; Tu et al., 2017; Tu et al., 2018; Yang et al., 2021) in the field of computer vision, especially after the emergence of fully connected convolutional neural networks (Shelhamer et al., 2017), researchers have begun to use deep learning methods for image segmentation. The emergence of the U-Net (Ronneberger et al., 2015) network model has greatly promoted the development of medical image segmentation (Lei et al., 2020a). Since then, this end-to-end segmentation network (Nie et al., 2016) has become the benchmark for medical image segmentations. U-Net is a completely symmetrical encoder-decoder structure. In the encoder part, the network gradually extracts the deep semantic information of images, and then in the decoder part, feature maps are gradually restored into a segmentation map. The skip connection enables the network to fuse all levels of feature information from the encoder during the decoding process, which enables the network to obtain more refined segmentation results. Due to the great success of U-Net, various improved U-Nets have been proposed (Guo et al., 2019; De Sio et al., 2021). These improved networks can be roughly grouped into two categories. The first category of methods often employs a new network backbone instead of the convolution in the original network encoder-decoder part, such as VGG (Simonyan and Zisserman, 2014), ResNet (He et al., 2016), DenseNet (Huang et al., 2017), GhostNet (Han et al., 2020), etc. The second category of method often adds some new function modules to U-Net to enhance network performance, such as attention U-Net (Oktay et al., 2018), CE-Net (Gu et al., 2019), QAU-Net (Hong et al., 2021), and RA-UNet (Jin et al., 2018). In addition, R2-UNet (Alom et al., 2018) adopts circular convolution, that can use the same feature map to extract information multiple times, and make full use of the potentially useful information in the feature map. UNet++ (Zhou et al., 2020) explores the impact of different depths of U-Net on network performance and adopts a new skip connection to gather features of different semantic scales. UNet3+ (Huang et al., 2020) further proposes a full-scale skip connection to fuse the low-level information and high-level semantics of feature maps of different sizes. LV-Net (Lei et al., 2020b) uses a lightweight network to segment the liver. Furthermore, there are some improved networks such as DefU-Net (Lei et al., 2021), CE-Net (Gu et al., 2019) and MSB-Net (Shao et al., 2019) that use multi-scale feature fusion to enhance the feature representation of the network.

These networks perform pooling operations to achieve down-sampling multiple times in the encoder part. The purpose is to gradually pass down the feature information of images, and in this process, the feature information of the image space and channel is continuously integrated, and finally extracted deep semantic information. Due to the characteristics of pooling, both the average pooling (Wang et al., 2021) or the maximum pooling (Nagi et al., 2011; Giusti et al., 2013; Graham, 2014; Bulo et al., 2017) will inevitably lead to the loss of some image feature information. Skip connection is a common operation in medical image segmentation networks. In order to achieve various skip connections, researchers usually use pooling operation to process feature maps to the same size, which will also cause the loss of feature information. This is especially so, when we need to change the feature map ruler on a large scale to realize skip connection (Huang et al., 2020), the information loss caused by the pooling operation will be more.

In order to solve the problem of image information loss caused by pooling operation, this paper proposes a weighted pooling (Golan et al., 2012; Zhu et al., 2019) operation. The operation is to allow images to be trained with parameters during the process of down-sampling. Using weighted pooling operation, we can change the size of the feature map while reducing information loss, which provides conditions for achieving more types of skip connection, especially for the situation where a large range of feature map size needs to be changed. In summary, we have made the following contributions:

1) We propose a weighted pooling operation that can reduce the loss of feature information of images while performing the reduction of the image resolution.

2) We demonstrate that the weighted pooling operation is helpful to the realization and improvement of various skip connections.

3) Based on the weighted pooling operation and the full-scale skip connection, we design a novel U-shaped network for liver and liver-tumor segmentation, and the proposed network achieves better performance than U-Net.

The rest of the paper is organized as follows. In Methods section, we introduce the principle and implementation method of weighted pooling in detail, and then explain the improvement method of weighted pooling on skip connection, and finally introduce the network we propose. In Experiments section, we use experiments to demonstrate the effectiveness of weighted pooling; Finally, in Conclusion section we present conclusions and plan our future work.

Methods

In this section, we will sequentially introduce the weighted pooling operation and the improved skip connection based on the weighted pooling operation. Then we design a new U-shaped network for liver and liver tumor segmentation.

Weighted Pooling Operation

Pooling operation is very common in convolutional neural networks (Seo et al., 2020), and its purpose is to perform down-sampling on an image so that convolution kernels can gradually obtain a larger receptive field and fuse more image context information. The traditional pooling operation uses the maximum pooling or the average pooling, that is, taking the maximum value of pixels or the average value of pixels in a window, and traversing the entire feature map to achieve the purpose of down-sampling after determining the step size. Taking the maximum pooling as an example, the pooling operation can be expressed as $p = \max_{l} p_{i}$ , where $l$ is the set of a window, $p_{i}$ is a pixel values set from the window, and $p$ is the finally selected pixel value. Although the maximum pooling and average pooling can achieve the purpose of extracting the main information and average information, this simple operation will inevitably lead to the loss of feature information, especially for some small objects and detailed information, e.g., a relevant object information may be completely lost after multiple pooling operation.

The weighted pooling operation is to give each pixel in the window a learnable parameter, and then let the window traverse the entire feature map, which can achieve the purpose of down-sampling and reduce the loss of image information. Specifically, similar to the process of pooling operation, we can set a matrix window with parameters for the feature map of each channel. The size of the window can be determined by itself according to different tasks or the degree of down-sampling. After determining the step size, let the window slide on the feature map of each channel to traverse the entire feature map. For example, we perform pooling operation using a 2 × 2 window with a step size of 2, the pooling process is shown in Figure 1.

FIGURE 1

FIGURE 1. $p_{1}$ , $p_{2}$ , $p_{3}$ , and $p_{4}$ are the pixel values in the feature map, $ω_{1}$ , $ω_{2}$ , $ω_{3}$ , and $ω_{4}$ are the parameters to be learned in the setting window, and $p$ is the finally obtained pixel value.

According to Figure 1, we understand the basic principle of the weighted pooling, and the final pixel value can be expressed as:

p = \sum_{l} p_{i} ω_{i} (1)

where $l$ is the set window size, $p_{i}$ is the pixel value of the feature map in the window, and $ω_{i}$ is the parameter to be learned in the window.

According to (1), we can design the corresponding parameter matrix window, but the existing modules allow us to implement weighted pooling very conveniently. We can perform the convolution operation on the feature map of each channel separately, because the convolution operation is the process of sliding the matrix window with parameters on the feature map, which is the same as the idea of the weighted pooling. The window size in the pooling process is generally even, which is more conducive to our down-sampling, because odd-numbered windows will have more stringent requirements on the resolution of the feature map, so the weighted pooling also chooses even-numbered convolution kernels. To perform the operation, and then set different step lengths to achieve different down-sampling requirements. In different tasks, we can freely choose even-numbered convolution kernels of different sizes to perform weighted pooling operation.

The Full-Scale Skip Connection

A key factor for the success of U-Net is the use of skip connection (Milletari et al., 2016; Jin et al., 2017; Guo et al., 2021), which enables the network to fuse low-level and high-level feature information during the decoding process, and finally obtains a more accurate segmentation result. It is worth noting that there is a premise for using skip connection. The premise is that the feature maps must have the same size. Due to the limitation of the size of the feature map, skip connection is not arbitrary.

For vanilla skip connection, to obtain the feature map of the same size, researchers often use the pooling operation to change the size of the feature map. According to the previous analysis, the use of pooling to change the size of the feature map will inevitably lead to the loss of image information, especially when the image size needs to be changed in a large range, just like the full-scale skip connection in U-Net3+. The proposed weighted pooling operation can well solve the problem, which provides the possibility for the proposal of more effective skip connections, which is also another important contribution of this work. The improvement of weighted pooling to skip connection is shown in Figure 2. This paper mainly uses weighted pooling to improve full-scale skip connection.

FIGURE 2

FIGURE 2. The use of weighted pooling reduces the loss of information while changing the size of the feature map.

It can be seen from Figure 2 that we can use weighted pooling to change the size of the feature map to the target size, which will facilitate our subsequent skip connection and provide more possibilities for the design of the network model.

The Proposed Network

This paper uses the weight pooling module and the improved full-scale skip connection to design a network for liver and liver-tumor segmentation. The network framework is shown in Figure 3.

FIGURE 3

FIGURE 3. Liver and liver-tumor segmentation network designed using weighted pooling and improved full-scale skip connection.

In Figure 3, we can see that the network is an enhanced version of U-Net. The entire network can still be divided into two parts: the encoder and the decoder. The convolution modules of the encoder and the decoder use deep separable convolution (Chollet, 2017), which can greatly reduce the number of parameters of the network while maintaining segmentation performance of the network. In the encoder part, all down-sampling operations use the weighted pooling module to replace the original maximum pooling module, which can reduce the loss of image information. For the first and second down-sampling, we used a 2 × 2 window for weighted pooling, which is conducive to the extraction of image edge information and detail information. Small windows are often adopted since they can do this well. For the third and fourth down-sampling, we used 4 × 4 and 8 × 8 windows respectively for weighted pooling. We hope that the deeper layer of the encoder can fuse the image information of the larger area of the feature map, which is conducive to the decoding process and finally obtains more accurate image segmentation results. A large window is often adopted since it can do this well. For the feature map containing deep semantic information finally obtained by the encoder, we use the SE module (Jie et al., 2017) to perform feature fusion. In the decoder part, for upsampling, we use the combination of bilinear interpolation (Accadia et al., 2003; Kirkland, 2010) and 1 × 1 convolution (Szegedy et al., 2015) to replace the original deconvolution operation (Noh et al., 2016), which can reduce the parameters at the same time, and avoid the checkerboard-like phenomenon in the feature map.

Full-scale skip connection can combine low-level appearance information and high-level semantic information from feature maps of different sizes to clarify better the location and boundary of the liver. As shown in Figure 3, it is clear that each layer of the convolution module of the decoder combines the feature maps of all layers in the encoder. Compared with the original skip connection of U-Net, the full-scale skip connection is integrated from the network as a whole with sufficient information, and these features at different scales can obtain fine-grained details and coarse-grained semantics. As shown in Figure 3, every time the encoder part is down-sampled, the resolution of the feature map will become half of the original feature map. To achieve a full-scale skip connection, we must ensure that the size of the feature map is consistent. Then we must use pooling to make the feature map resolution consistent. But we can see in Figure 3 that the largest difference in size between the two feature maps is 8 times, which means that we need to use pooling to change the feature map to 1/8 of the original. The information loss in the middle will be huge. When we use weighted pooling to change the size of the feature map, we can reduce the loss of information, because we will learn with parameters when we change the resolution of the feature map. In this paper, we use an operation with a window size of 2 × 2 and a step size of 2 to change the resolution of the feature map to 1/2, and use an operation with a window size of 4 × 4 and a step size of 4 to change the resolution of the feature map to 1/4, and use an operation with a window size of 8 × 8 and a step size of 8 to change the resolution of the feature map to 1/8.

Experiments

Dataset and Pre-Processing

In order to evaluate the effects of the weighted pooling module on improving the performance of the liver and liver tumor segmentation network, we used the LiTS2017 (Liver Tumor Segmentation Challenge) dataset and CHAOS (Combined Healthy Abdominal Organ Segmentation) dataset as experimental data.

The LiTS2017 dataset contains 131 cases of labeled abdominal 3D CT scan images, in which the in-plane resolution ranges from 0.55 to 1 mm, the slice pitch ranges from 0.45 to 6 mm, and each image size is 512 × 512. The CHAOS dataset is a small dataset that contains 20 3D data, where the image size is 512 × 512. The entire experiments use the axial 2D slice images of the LiTS2017 and CHAOS dataset. We constructed the training set and validation set using 90 patients and 10 patients. Then the other 30 patients are considered as the test set. For the CHAOS, it was split into 16 patients for training and 4 patients for test.

Medical CT axial slices are different from normal axial slices. The former can obtain values ranging from −1,000 to 3,000, while the latter can only obtain values ranging from 0 to 255. In order to eliminate interference and enhance the liver area, we selected the [−200, 250] HU range and performed a normalization process.

Experimental Setup and Evaluation Metrics

All the algorithms in this experiment are implemented on the NVIDIA GeForce RTX 3090 Ti server. The neural network chooses pytorch 1.7.0 as the framework for training and execution. In the training process, the learning rate is set to 0.001, and this experiment did not use the dynamic learning rate. The number of training rounds in this experiment is set to 100.

The most commonly used and effective index for medical image segmentation is Dice score. The value of Dice ranges from 0 to 1. The larger the value, the higher the accuracy of segmentation. The value of Dice for perfect segmentation is 1. This experiment uses the average Dice of all slices in the test dataset as the evaluation metrics.

Ablation Study

This paper is mainly to study the effect of the weighted pooling operation on the liver and liver-tumor segmentation. The paper mainly highlights two contributions. The first is that the weighted pooling operation can be used to replace the traditional pooling module to reduce the loss of image feature information, and the second is that the weighted pooling operation can improve various skip connections. This paper improves the full-scale skip connection. In order to explore the validity of these contributions, we conducted two sets of ablation experiments on the LiTS2017 dataset and CHAOS dataset.

The Effectiveness of the Weighted Pooling Operation

We let the U-Net, CE-Net, and U-Net++ networks to be trained on the liver and liver-tumor training dataset respectively, and then get the segmentation accuracy of the liver and liver-tumor on the test dataset. Then, we use the weighted pooling operation instead of the maximum pooling in the U-Net, CE-Net, U-Net++. Then the new network is trained on the liver and liver-tumor training dataset respectively, and the segmentation accuracy of the liver and liver-tumor on the test dataset is obtained.

In Table 1, when we use the weighted pooling operation to replace the maximum pooling module, the segmentation accuracy of liver and liver-tumor are improved, which verifies the first contribution of the paper. Especially for the CHAOS dataset with a small amount of data, reducing the information loss in the pooling process can more effectively improve the network performance. In Figure 4, we can see the difference of the segmentation results more clearly.

TABLE 1

TABLE 1. The improvement effect of weighted pooling on different networks.

FIGURE 4

FIGURE 4. Segmentation results of maximum pooling and weighted pooling. The experimental network is U-Net.

We have marked the main differences in the segmentation results with red dashed lines. We can see from the segmentation results that weighted pooling is more powerful. The first set of liver segmentation results show that weighted pooling can reduce the loss of information. The second set of liver segmentation results show that weighted pooling has better feature learning capabilities. The segmentation results of the two groups of liver-tumor show that weighted pooling is more conducive to the segmentation of small target objects, and weighted pooling has a better ability to learn detailed information.

The Improvement Effect of Full-Scale Skip Connection

The experiment process is similar to the first set of experiments. We trained U-Net with a general full-scale skip connection and U-Net with an improved full-scale skip connection, and then obtained the accuracy on the liver and liver-tumor test datasets.

In Table 2, when we use the improved full-scale skip connection, the segmentation accuracy of the liver and liver-tumor are improved, which demonstrates the improvement effect of full-scale skip connection based on the weighted pooling. In Figure 5, we can also see the difference in segmentation results. We have marked the main differences in the segmentation results with red dashed lines. We can see from the segmentation results that weighted pooling enhances the information transmission of skip connection, and weighted pooling reduces the information loss in skip connection.

TABLE 2

TABLE 2. The improvement effect of weighted pooling on skip connection.

FIGURE 5

FIGURE 5. Segmentation results of general skip connection and improved skip connection. The experimental network is U-Net.

Comparative Experiment

In order to test the performance of the proposed network, we conducted a set of comparative experiments on LiTS2017 dataset and CHAOS dataset. We trained several popular medical image segmentation networks, including U-Net, U-Net++, CE-Net, H-DenseUnet (Li et al., 2017), etc., and then obtained the segmentation accuracy of liver and liver-tumor of these networks on the test dataset. In Table 3 for the LiTS2017 dataset, we can see that the segmentation accuracy of the network proposed in the paper is higher than that of some popular medical image segmentation networks, except that H-DenseUnet has a slightly higher liver segmentation accuracy than our network. Although the liver segmentation accuracy of H-DenseUnet is slightly higher than our method, its network parameters are higher than our network. Fewer parameters means that our network takes up less memory and consumes less training time. At the same time, our network is easier to migrate to other devices. Our method achieves a good balance between computational complexity and performance. For the CHAOS dataset, our network achieved the best performance, which shows that our network has better learning ability in the case of less data.

TABLE 3

TABLE 3. The accuracy of different network models on the LiTS2017 dataset and CHAOS dataset.

In Figure 6, we can see the liver segmentation results of different methods. Our method performs well on detailed information and edge information, which shows that our method reduces the loss of information and better combines global feature information. The red dashed line marks some typical differences between our method and H-DenseUnet.

FIGURE 6

FIGURE 6. Liver segmentation results of different approaches.

Conclusion

In this work, we have rethought pooling operation in DCNNs for liver and liver-tumor segmentations. We have found that the vanilla pooling operation suffers from a problem of information loss leading to performance degradation of networks for image segmentations. To overcome this, we have proposed a weighted pooling operation to solve the problem. The weighted pooling operation allows an image to be learned with parameters during the down-sampling process, which reduces the image resolution while ensuring the integrity of the image information. Using the weighted pooling operation can easily make the feature map size consistent, which is helpful for the realization and improvement of various skip connections. The final experiments demonstrate the effectiveness of the weighted pooling operation for liver and liver-tumor segmentation. In the future, we will explore the availability of weighted pooling for segmentation of other types of images.

Data Availability Statement

Publicly available datasets have been analyzed in this study. This data can be found here: https://academictorrents.com/details/27772adef6f563a1ecc0ae19a528b956e6c803ce.

Author Contributions

JL and TL proposed the innovative ideas of the paper. WZ and XD designed and completed some experiments. JL and MX wrote the paper together. TL and AN made important revisions to the paper.

Funding

This work was supported in part by Natural Science Basic Research Program of Shaanxi (Program No. 2021JC-47), in part by the National Natural Science Foundation of China under Grant 61871259, Grant 61861024, in part by Key Research and Development Program of Shaanxi (Program No. 2021ZDLGY08-07), in part by Serving Local Special Program of Education Department of Shaanxi Province (21JC002), and in part by Xi’an Science and Technology program (21XJZZ0006).

Conflict of Interest

Author WZ was employed by the company China Electronics Technology Group Corporation Northwest Group Corporation.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

Lei would like to thank Brunel University London for a visiting position in 2020 to collaborate with Nandi.

References

Accadia, C., Mariani, S., Casaioli, M., Lavagnini, A., and Speranza, A. (2003). Sensitivity of Precipitation Forecast Skill Scores to Bilinear Interpolation and a Simple Nearest-Neighbor Average Method on High-Resolution Verification Grids. Auk 133.2, 129–130. doi:10.1175/1520-0434(2003)018<0918:sopfss>2.0.co;2