Color fundus photograph registration based on feature and intensity for longitudinal evaluation of diabetic retinopathy progression

Zhou, Jingxin; Jin, Kai; Gu, Renshu; Yan, Yan; Zhang, Yueyu; Sun, Yiming; Ye, Juan

doi:10.3389/fphy.2022.978392

ORIGINAL RESEARCH article

Front. Phys., 27 September 2022

Sec. Optics and Photonics

Volume 10 - 2022 | https://doi.org/10.3389/fphy.2022.978392

This article is part of the Research TopicInterdisciplinary Techniques in Biomedical PhotonicsView all 10 articles

Color fundus photograph registration based on feature and intensity for longitudinal evaluation of diabetic retinopathy progression

Jingxin Zhou¹^†

Kai Jin¹^†

Renshu Gu²

Yan Yan¹

Yueyu Zhang²

Yiming Sun¹

Juan Ye¹*

¹Center of Ophthalmology, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
²Department of Computer Science, Hangzhou Dianzi University, Hangzhou, Zhejiang, China

Longitudinal disease progression evaluation between follow-up examinations relies on precise registration of medical images. Compared to other medical imaging methods, color fundus photograph, a common retinal examination, is easily affected by eye movements while shooting, for which we think it is necessary to develop a reliable longitudinal registration method for this modality. Thus, the purpose of this study was to propose a robust registration method for longitudinal color fundus photographs and establish a longitudinal retinal registration dataset. In the proposed algorithm, radiation-variation insensitive feature transform (RIFT) feature points were calculated and aligned, followed by further refinement using a normalized total gradient (NTG). Experiments and ablation analyses were conducted on both public and private datasets, using the mean registration error and registration success plot as the main evaluation metrics. The results showed that our proposed method was comparable to other state-of-the-art registration algorithms and was particularly accurate for longitudinal images with disease progression. We believe the proposed method will be beneficial for the longitudinal evaluation of fundus images.

Introduction

Diabetic retinopathy (DR) is one of the major diseases that can cause blindness. It is estimated that about 600 million people will have diabetes by 2040 [1], a third of whom will be affected by DR [2]. Regular follow-up and accurate analysis of longitudinal examinations play an important part in the management of DR [3]. However, the quantitative analysis of longitudinal images is still challenging, due to the tremendous discrepancies between the images caused by vastly different photographing conditions, involuntary eye movements, and pathological changes [4], which can disturb the observation and influence the evaluation of retinal image biomarkers [5]. Registration, which means the process of establishing pixel-to-pixel correspondence between two images, grants us the chance to eliminate these discrepancies before longitudinal assessment [6]. Therefore, a preliminary registration of two retinal images is required to reduce these effects and generate a reliable disease progression conclusion.

As a necessary work of retinal image analysis, the registration of retinal fundus images is a classic topic, in which tremendous efforts have been put into this area during past decades. From a methodological point of view, retinal image registration methods can be classified into three groups: feature-based, intensity-based, and hybrid methods. In feature-based registration methods, invariant features of the retinal images are extracted and utilized for seeking the best geometric transformation between two images. Retinal vessel bifurcations [7–11], optic disc, and fovea [12, 13] are previously commonly used features. However, some of these features rely on the segmentation of retinal structures and are sensitive to image quality. Therefore, easily-obtained and stable key-point detection is the premise for robust registration through feature-based methods, for example, Harris corner [14], scale invariant feature transform (SIFT) [15], and Speeded-Up Robust Features (SURF) [16] are classic feature points that have been extensively studied. Hernandez-Matas et al. [17, 18] introduced a feature-based registration framework exploiting the spherical eye model and pose estimation. In intensity-based methods, the intensity information is calculated and used to measure the similarity of the images and the registration performance, such as cross-correlation [19], mutual information [20], and phase correlation [21]. Hybrid registration methods combine feature-based and intensity-based methods together to seek better performance [4, 22]. Compared to single feature-based or intensity-based methods, hybrid methods have great potential for more accurate and practical image alignment, but it is less investigated. Although the registration of color fundus photographs has been intensively studied, the steps of seeking higher and more robust performance have never stopped.

Although there has been intensive research work in registration, further research is still needed. First, novel registration methods developed on other modalities should be applied to retinal images to seek better performance. Second, instead of paying attention to the improvement and development of registration methods, researchers should focus more on the clinical applicability of the proposed methods, which is extremely important for longitudinal follow-up examinations. Third, the development and evaluation of registration methods rely on the publication of open-access datasets. To the best of our knowledge, the Fundus Image Registration (FIRE) dataset is the only registration dataset that has been made publicly available [23]. We thought it would be useful to develop a registration dataset made up of longitudinal images with clarified medical diagnoses. Therefore, developing registration methods in clinical settings and establishing registration datasets would be greatly beneficial for interdisciplinary cooperation and clinical transformation of computation methods.

In this study, a robust registration method for longitudinal color fundus photographs based on both feature and intensity is proposed. An ablation study showed the necessity of combining the two main parts. A comprehensive comparison between the proposed algorithm and other state-of-the-art methods was conducted to investigate its features. The dataset will be available for registration research. We believe this work will be beneficial to follow up retinal image analysis and disease progression assessment.

Materials and methods

The proposed registration framework is a combination of feature-based and intensity-based methods. The flow of this work is shown in Figure 1.

FIGURE 1

FIGURE 1. Flowchart of the proposed algorithm.

Retinal image datasets

For the evaluation of the proposed registration method, we use two datasets, FIRE and FI-LORE, consisting of color fundus image pairs different from each other in terms of actual photographing and patient conditions. These datasets are described in detail hereinafter.

The Fundus Image Registration (FIRE) dataset [23] comprises 134 image pairs, which are further classified into three categories according to their characteristics. Category S contains 71 image pairs with more than 75% overlap area and super-resolution but no anatomical changes, while category P contains 49 image pairs with less than 75% overlap area and no anatomical changes. Category A contains 14 image pairs with high overlap and large anatomical changes due to retinopathy, which can be used to mimic practical longitudinal examinations. All the images have a resolution of 2,912 × 2,912 pixels. FIRE provides ground truths for the calculation of registration errors.

The Fundus Image for Longitudinal Registration (FI-LORE) dataset consists of 83 color fundus image pairs from 78 eyes of 54 diabetic retinopathy patients who underwent longitudinal examinations at the Second Affiliated Hospital of Zhejiang University, School of Medicine, from May 2020 to July 2020. Photograph conditions, involuntary movements of the eye, and disease progression and treatments, such as laser scars, all contribute to the differences of each image in a pair. Additionally, some of them are of low image quality because of complications, such as cataracts. FI-LORE can fully reflect practical conditions of the follow-up in clinics and test the robustness of the proposed method. All the images have a resolution of 1,500 × 1,500 pixels. Finally, to compute the registration error, we follow the annotation rule of the FIRE dataset [23], carefully choosing 10 corresponding points and repeatedly correcting the exact location of the points to guarantee the reliability of the ground truths. The FI-LORE dataset will be made publicly available.

Proposed registration framework

To normalize the images in different datasets taken at different examinations, preprocessing is the first step in the algorithm. First, the mask provided by the FIRE dataset was utilized to delete the blank margin of the original images. Second, the cropped images are further resized to 1,500 × 1,500 pixels to reach unity of the whole dataset. Furthermore, to simplify the calculation process, the RGB images are transformed into grayscale images.

Radiation-variation insensitive feature transform (RIFT) is a feature-based registration method with great robustness to non-linear radiation distortion (NRD) [24]. NRD is a rather common phenomenon that can be caused by involuntary movements of the eye. Therefore, we think it can be used for retinal image alignment tasks. The detail of the RIFT calculation can be found in the original article [24]. The alignment process is realized by the RANdom SAmple Consensus (RANSAC) algorithm [25]. Then, an affine transformation matrix $p$ is generated on the resized image pairs (500 × 500 pixels). NTG, the normalized total gradient, was proposed by Chen et al [26], working as a registration measure. The employment of a NTG is based on the observation that the gradient difference is sparsest when the two images are perfectly aligned. The NTG is thought to outperform other intensity-based measures, such as mutual information, residual complexity, correlation ratio, and normalized cross-correlation. However, there is no study to assess the validity of the NTG in retinal images. The detail of NTG calculation is given by Chen et al. [26].

Registration evaluation

To quantitatively assess the performance of the registration result, we adopt a widely accepted registration error calculation method [23], which requires the ground truths of image pairs. Given the sets of reference points, $Y_{I} = {y_{I}^{1}, y_{I}^{2}, y_{I}^{3} \dots y_{I}^{10}} \subset R^{2}$ and $Y_{R} = {y_{R}^{1}, y_{R}^{2}, y_{R}^{3} \dots y_{R}^{10}} \subset R^{2}$ , where I and R represent the registered image and reference, respectively, the mean registration error (MRE) can be calculated as

M R E (Y_{I}, Y_{R}, p) = \frac{1}{10} \sum_{i = 1}^{10} {‖ y_{R}^{i} - p (y_{I}^{i}) ‖}_{2} . (1)

The ${‖ ‖}_{2}$ here represents the Euclidean norm. Hence, the closer MRE is to 0, the better the registration performance will be. To assess the registration results of a total dataset, we here utilize the success plot [18], where the x-axis marks the registration error threshold under which registration is considered to be successful and the y-axis marks the percentage of successfully registered image pairs of the given threshold. The area under the curve (AUC) is counted to quantitatively assess the registration method.

Results

Results of the ablation study

To better understand the contributions of each part of the registration framework and validate the effectiveness of the combination of RIFT and NTG, we conducted an ablation study to see the registration performance of these two procedures themselves. Table 1 shows the registration results of the ablation study in the FIRE dataset. Figure 2 is the success plot of the ablation study. From the results, we can see that the NTG performs better in image pairs with high overlap, but once the overlap is small, the NTG method performance is relatively poor. However, RIFT is on the opposite side of the NTG. The combination of the algorithms outperforms each one of them. Therefore, the combination of RIFT and NTG grants algorithm robustness to image pairs of different overlap areas.

TABLE 1

TABLE 1. Results of the ablation study in the FIRE dataset.

FIGURE 2

FIGURE 2. Registration success plot of the ablation study. The x-axis marks, in pixels, the registration error threshold under which registration is considered to be successful. The y-axis marks the percentage of successfully registered image pairs for a given threshold.

Comparison to other registration methods

To further assess the accuracy of the proposed method in color fundus image registration, we compare our results to other state-of-the-art image registration methods which are already utilized in the FIRE dataset, including GDB-ICP [27], Harris-PIIFD [28], ED-DB-ICP [29], RIR-BS [30], SIFT + WGTM [31], SURF + WGTM [31], ATS-RGM [32], EyeSLAM [33], GFEMR [34], VOTUS [35], REMPE [18], and a deep-learning based registration method proposed by Rivas-Villar et al [36]. Figure 3 is the qualitative illustration of the proposed method registration results. Table 2 lists the methods used for comparison and the AUC of the success plot. Figure 4 contains the success plot of the proposed method and some other methods whose results are publicly available online. From the aforementioned results, one can conclude that the proposed method is competitive to the leading registration methods in category S. The AUC of category P clearly underperforms some algorithms. But the proposed method outperforms all the methods in category A, which stands for the longitudinal study. Therefore, we think the proposed method can remain robust under the anatomical changes and disease progression and is well worth further study.

FIGURE 3

FIGURE 3. Registration results of the proposed algorithm in the FIRE dataset. The overlap decreases from the top row to bottom. (A) and (B) are image pairs without registration. (C) Image pairs shown in an overlaying form after registration. (D) Checkerboard comparisons of the proposed method.

TABLE 2

TABLE 2. Comparisons to state-of-the-art image registration methods.

FIGURE 4

FIGURE 4. Registration success plot of the comparisons between the proposed and other registration methods. The x-axis marks, in pixels, the registration error threshold under which registration is considered to be successful. The y-axis marks the percentage of successfully registered image pairs for a given threshold.

Results in the FI-LORE dataset

Retinal images in real clinical conditions may be of poor quality due to complications such as cataracts, loss of focus, and various light conditions, posing great problems to the practical use of registration. As described earlier, FI-LORE is a collection from real ophthalmologic practice, which can be used to validate the utility of the proposed method. Figure 5 shows some image pairs before and after registration. We can see that despite these image pairs having dissimilar illumination conditions, pathological changes, and disease progression, our method can robustly align the longitudinal images to a satisfying extent. Figure 6 is the success plot of the proposed method and other state-of-the-art color fundus image registration methods in FI-LORE. The AUCs of RIFT, NTG, RITF + NTG (proposed), REMPE, and GFEMR are 0.841, 0.755, 0.850, 0.840, and 0.808, respectively. Through the quantitative analysis of the registration results in FI-LORE, we have validated the superior performance of the registration method in real clinical conditions.

FIGURE 5

FIGURE 5. Registration results of the proposed algorithm in FI-LORE. The pairs listed, respectively, represent poor illumination quality, pathological change, and disease progression, which are common conditions in longitudinal examinations. (A) and (B) are image pairs without registration. (C) Image pairs shown in an overlaying form after registration. (D) Checkerboard comparisons of the proposed method.

FIGURE 6

FIGURE 6. Registration success plot of the registration results in FI-LORE. The x-axis marks, in pixels, the registration error threshold under which registration is considered to be successful. The y-axis marks the percentage of successfully registered image pairs for a given threshold. The AUCs of RIFT, NTG, RITF + NTG (proposed), REMPE, and GFEMR are 0.841, 0.755, 0.850, 0.840, and 0.808, respectively.

Discussion

Longitudinal assessment of DR retinal images is of great importance, and longitudinal registration is an important and fundamental task which has often been neglected in clinical situations, especially for follow-up examinations. Precise alignment of different examination images is the premise for accurate detection and analysis of pathological changes, which has already been adopted in some automated retinal image analyzing devices [37]. In this study, we proposed a hybrid registration method with comprehensive experiments showing its excellent performance in longitudinal images and established a color fundus photograph dataset with pixel-wise annotation ground truth. In Table 1, the NTG shows the best performance in category S, and RIFT registered better in category P. We can conclude that the intensity-based registration method, NTG, is more precise in image pairs with large overlap while RIFT is the opposite. Thus, the combination of RIFT and NTG is reasonable and has been proved to be the best on the whole FIRE. Results from the intensive comparison experiments showed that our method is comparable to state-of-the-art image registration methods, such as GFEMR [34], VOTUS [35], and REMPE [18]. For longitudinal images in category A, the proposed method outperformed other state-of-the-art methods, for which we think it is suitable for the clinical evaluation of disease progression in follow-up examinations. This conclusion is further validated using a private dataset, FI-LORE, with more longitudinal images. Taking all of these into consideration, we believe the proposed method is good at registering longitudinal retinal images and will be beneficial in clinical use. It should be noted that in category P, which is made up of images with partial overlap, the MRE and AUC are far less than those in the other two classes. The main source of error came from several misregistered image pairs which show an MRE of nearly a thousand pixels. The same trends can also be observed in some state-of-the-art methods. The private dataset also contains images with less overlap, and the proposed method can also register them, as shown in Figure 6. Further research is needed to validate its performance and investigate the reason why these algorithms did not perform well in category P. During image pre-processing, we thought resizing might affect the final results. In the current study, the size of 500 × 500 pixels is recommended. We conducted experiments on 250 × 250 and 750 × 750 pixels, results of which can be found in Supplementary Material. Also, mutual information was evaluated to confirm the performance of the NTG, and relevant results are given in Supplementary Material.

Most of the development of registration methods focuses on either feature-based or intensity-based registration. As far as we know, only several studies adopted hybrid methods. In 2016, Saha et al. proposed a hybrid method using Speeded-Up Robust Features (SURF) and Binary Robust Independent Elementary Features (BRIEF) [4], which are both well-known registration methods and their combination generated better performance. In our study, two registration methods, RIFT and NTG, were adopted, and further investigation revealed their own characteristics in image registration. To the best of our knowledge, this is the first study that used RIFT and NTG in ophthalmic imaging and investigated their applicability to different image overlaps. We think that the two-step hybrid registration method is promising in retinal imaging.

Deep learning has been showing great potential in medical image processing, including segmentation and registration. Several methods have been proposed to attempt to utilize deep learning in retinal image registration [36, 38–45]. However, to the best of our knowledge, there are two inherent problems limiting the development of deep learning-based registration. On the one hand, unlike other image processing issues (segmentation, enhancement, etc.), registration contains two steps theoretically, feature recognition and feature alignment. In retinal image registration, these usually mean retinal feature extraction (feature points, vessel network, etc.) and retinal feature alignment. Therefore, an inevitable question comes up, that is, when and where to adopt deep learning in the registration workflow. Different researchers provided various solutions. Some researchers adopted deep learning in the feature detection process and further aligned the feature points using conventional image alignment methods, such as RANSAC[40, 41], while some work constructed an outlier-rejection network to compute the image transformation matrix [45, 46]. There is no consensus on how deep learning should be added to the registration pipeline [46]. Moreover, in most deep learning-based registration algorithms, accurate registration relies on accurate segmentation, which is still an ongoing research topic in medical image processing. On the other hand, training and validation of deep learning networks rely on massive labeled data. In the specific topic of image registration, ground truth annotation is labor-intensive and time-consuming. For some deep learning methods using vessel segmentation, the networks also need large annotated vessel segmentation datasets. From these two perspectives, we tend to believe that although deep learning has shed light on medical image processing and analysis, it is still in the exploration stage for image registration. In the current study, we compare a state-of-the-art registration method with the proposed method, and the results showed that for longitudinal retinal image registration, our proposed method still stood out. Deep learning-assisted retinal image registration should be paid more attention to find out whether it is actually superior to conventional algorithms.

The development of retinal image registration methods is limited due to the lack of registration datasets. As far as we know, FIRE is the only dataset that focuses on retinal image registration and proposes pixel-level ground truth which can be used for the development and evaluation of registration methods. However, the longitudinal category contains only 14 image pairs, significantly small when compared to other categories. Taking this situation and the clinical use of registration methods into account, we collected and annotated 83 image pairs, especially for longitudinal image registration tasks. These image pairs are different in photograph conditions, involuntary movements of the eye, and disease progression and treatments. We believe that the adoption of this dataset can greatly benefit the study of retinal image registration.

Because of the specialty of registration in retinal image analysis, some methods have been claimed to be put into clinical use. To the best of our knowledge, there is one registration software, the DualAlign i2k software package (Clifton Park, NY), that has been made commercial. The software was developed based on the GDB-ICP algorithm, which has been compared in our work [27]. With growing interest in image registration, more novel and efficient methods have been proposed to ensure better and swifter registration performance. These methods show promise for medical image registration tasks. However, due to the lack of interdisciplinary cooperation of medical and computer science researchers, the study of these novel methods for medical use is limited. In this study, we focus on two methods and validate their performance. We believe more research is needed to provide more possibilities for more precise and swifter image analysis in real clinical use.

There are some limitations to our current study. First, the proposed algorithm performed relatively poorly in the category which stands for images with small overlay. However, there are also some similar image pairs in the FI-LORE dataset, but the proposed methods did not perform like that, which is confusing. More image pairs are needed to further test this method. Second, in the current study, we still focus on unimodality registration tasks. The performance of this method in multi-modal tasks needs more examination. Third, some state-of-the-art methods should be compared with the local dataset FI-LORE, but due to the lack of reliable source codes and our inability to completely repeat the methods, we failed to put them into further comparison. Finally, there are some artificial intelligence algorithms that have been developed for retinal image registration tasks [22, 47, 48]. We have compared one deep learning algorithm, but further studies are needed to investigate deep learning in the context of retinal image alignment.

Conclusion

RIFT can better align images with small overlap, while the NTG is more precise with large overlap image pairs. Thus, the combination of RIFT and NTG was reasonable and outperformed single RIFT or NTG. The proposed method was comparable to other state-of-the-art registration algorithms and was especially accurate for longitudinal images with disease progression. We believe that the proposed method will be beneficial for the longitudinal evaluation of fundus images.

Data availability statement

The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Author contributions

JZ and KJ contributed to the idea, performed the experiments, analyzed the results, and wrote the manuscript. RG helped with the experiments and gave meaningful advice in the algorithm part. YY and YZ helped perform the analysis. YS helped with the preparatory work. JY contributed to the conception of the study and supervised the whole process of the experiment and writing.

Funding

The study was supported in part by the National Key Research and Development Program of China (2019YFC0118400), the Key Research and Development Program of Zhejiang Province (2019C03020), the Clinical Medical Research Center for Eye Diseases of Zhejiang Province (2021E50007), and the Natural Science Foundation of Zhejiang Province (grant number LQ21H120002).

Acknowledgments

The authors would like to thank C. Hernandez-Matas et al. for providing with the data from the FIRE database (see https://projects.ics.forth.gr/cvrl/fire/).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphy.2022.978392/full#supplementary-material

References

1. Ogurtsova K, da Rocha Fernandes JD, Huang Y, Linnenkamp U, Guariguata L, Cho NH, et al. IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Res Clin Pract (2017) 128:40–50. doi:10.1016/j.diabres.2017.03.024

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Yau JW, Rogers SL, Kawasaki R, Lamoureux EL, Kowalski JW, Bek T, et al. Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care (2012) 35(3):556–64. eng. Epub 2012/02/04. doi:10.2337/dc11-1909

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Wong TY, Sun J, Kawasaki R, Ruamviboonsuk P, Gupta N, Lansingh V, et al. Guidelines on diabetic eye care. Ophthalmology (2018) 125(10):1608–22. EnglishCited in: Pubmed; PMID WOS:000445012100028. doi:10.1016/j.ophtha.2018.04.007

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Saha SK, Xiao D, Frost S, Kanagasingam Y. A two-step approach for longitudinal registration of retinal images. J Med Syst (2016) 40(12):277. Epub 2016/10/28Cited in: Pubmed; PMID 27787783. doi:10.1007/s10916-016-0640-0

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Ting DSW, Peng L, Varadarajan AV, Keane PA, Burlina PM, Chiang MF, et al. Deep learning in ophthalmology: The technical and clinical considerations. Prog Retin Eye Res (2019) 72:100759. EnglishCited in: Pubmed; PMID WOS:000488311400001. doi:10.1016/j.preteyeres.2019.04.003

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Zitová B, Flusser J. Image registration methods: A survey. Image Vis Comput (2003) 21(11):977–1000. doi:10.1016/s0262-8856(03)00137-9