AUTHOR=Wang Dong , Lian Jian , Jiao Wanzhen TITLE=Multi-label classification of retinal disease via a novel vision transformer model JOURNAL=Frontiers in Neuroscience VOLUME=17 YEAR=2024 URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2023.1290803 DOI=10.3389/fnins.2023.1290803 ISSN=1662-453X ABSTRACT=Introduction

The precise identification of retinal disorders is of utmost importance in the prevention of both temporary and permanent visual impairment. Prior research has yielded encouraging results in the classification of retinal images pertaining to a specific retinal condition. In clinical practice, it is not uncommon for a single patient to present with multiple retinal disorders concurrently. Hence, the task of classifying retinal images into multiple labels remains a significant obstacle for existing methodologies, but its successful accomplishment would yield valuable insights into a diverse array of situations simultaneously.

Methods

This study presents a novel vision transformer architecture called retinal ViT, which incorporates the self-attention mechanism into the field of medical image analysis. To note that this study supposed to prove that the transformer-based models can achieve competitive performance comparing with the CNN-based models, hence the convolutional modules have been eliminated from the proposed model. The suggested model concludes with a multi-label classifier that utilizes a feed-forward network architecture. This classifier consists of two layers and employs a sigmoid activation function.

Results and discussion

The experimental findings provide evidence of the improved performance exhibited by the suggested model when compared to state-of-the-art approaches such as ResNet, VGG, DenseNet, and MobileNet, on the publicly available dataset ODIR-2019, and the proposed approach has outperformed the state-of-the-art algorithms in terms of Kappa, F1 score, AUC, and AVG.