Objectives

AUTHOR=Zhang Huaqi , Chen Huang , Qin Jin , Wang Bei , Ma Guolin , Wang Pengyu , Zhong Dingrong , Liu Jie 

TITLE=MC-ViT: Multi-path cross-scale vision transformer for thymoma histopathology whole slide image typing

JOURNAL=Frontiers in Oncology

VOLUME=12

YEAR=2022

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.925903

DOI=10.3389/fonc.2022.925903

ISSN=2234-943X

ABSTRACT=<sec><title>Objectives</title><p>Accurate histological typing plays an important role in diagnosing thymoma or thymic carcinoma (TC) and predicting the corresponding prognosis. In this paper, we develop and validate a deep learning-based thymoma typing method for hematoxylin &amp; eosin (H&amp;E)-stained whole slide images (WSIs), which provides useful histopathology information from patients to assist doctors for better diagnosing thymoma or TC.</p></sec><sec><title>Methods</title><p>We propose a multi-path cross-scale vision transformer (MC-ViT), which first uses the cross attentive scale-aware transformer (CAST) to classify the pathological information related to thymoma, and then uses such pathological information priors to assist the WSIs transformer (WT) for thymoma typing. To make full use of the multi-scale (10×, 20×, and 40×) information inherent in a WSI, CAST not only employs parallel multi-path to capture different receptive field features from multi-scale WSI inputs, but also introduces the cross-correlation attention module (CAM) to aggregate multi-scale features to achieve cross-scale spatial information complementarity. After that, WT can effectively convert full-scale WSIs into 1D feature matrices with pathological information labels to improve the efficiency and accuracy of thymoma typing.</p></sec><sec><title>Results</title><p>We construct a large-scale thymoma histopathology WSI (THW) dataset and annotate corresponding pathological information and thymoma typing labels. The proposed MC-ViT achieves the Top-1 accuracy of 0.939 and 0.951 in pathological information classification and thymoma typing, respectively. Moreover, the quantitative and statistical experiments on the THW dataset also demonstrate that our pipeline performs favorably against the existing classical convolutional neural networks, vision transformers, and deep learning-based medical image classification methods.</p></sec><sec><title>Conclusion</title><p>This paper demonstrates that comprehensively utilizing the pathological information contained in multi-scale WSIs is feasible for thymoma typing and achieves clinically acceptable performance. Specifically, the proposed MC-ViT can well predict pathological information classes as well as thymoma types, which show the application potential to the diagnosis of thymoma and TC and may assist doctors in improving diagnosis efficiency and accuracy.</p></sec>