AUTHOR=Zongren Li , Silamu Wushouer , Shurui Feng , Guanghui Yan 

TITLE=Focal cross transformer: multi-view brain tumor segmentation model based on cross window and focal self-attention

JOURNAL=Frontiers in Neuroscience

VOLUME=Volume 17 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2023.1192867

DOI=10.3389/fnins.2023.1192867

ISSN=1662-453X

ABSTRACT=Recently, Transformer and its variants have been a great success in computer vision for image classification, object detection, and image segmentation, surpassing the performance of convolutional neural networks (CNN).The key to the success of Transformer is the acquisition of short-term and long-term visual dependencies through self-attention mechanisms, which can well learn global and remote semantic information interactions.However, it also brings challenges. The computational cost of the global self-attention mechanism increases quadratically, which hinders the application of Transformers in high-resolution images. Many researches try to use local Windows to limit the scope of self-attention learning, and exchange the information between local Windows by shift operation to obtain global information, which reduces the computational complexity and memory consumption to some extent, and improves the model performance. However, due to the slow expansion of receptive field, it needs to stack a large number of blocks to achieve global self-attention, which limits the model performance improvement. In view of this, we propose a multi-view brain tumor segmentation model based on Cross window and focal self-attention, which is a novel mechanism to enlarge receptive field by parallel Cross window and improve global dependence by using local fine-grained and global coarse-grained interactions. First, we increase the receptive field by parallelizing the self-attention of horizontal and vertical fringes in the Cross window, thus achieving strong modeling capability while limiting the computational cost. Secondly, the focus self-attention of local fine-grained and global coarse-grained enables the model to capture short-term and long-term visual dependencies efficiently. Experiments on Brats2019 and Brats2021 data sets prove that our model has achieved excellent performance while limiting computational cost.