Abstract
Purpose :
Diabetic retinopathy (DR), a leading cause of blindness worldwide, is a complication of diabetes mellitus, a systemic condition that affects many organ systems throughout the body. A patient with DR often develops symptoms in both eyes. Accordingly, standard diagnostic workup for DR involves assessment of both eyes by a clinician. Due to the prevalence of DR, many deep learning (DL) based artificial intelligence (AI) models are being developed to serve as screening and diagnostic assistance tools for DR by analyzing fundus photographs or OCT images. While promising, one major limitation of existing models is that, unlike a clinician, they only utilize images from a single eye when making an assessment. This may limit the diagnostic capabilities of such models. To overcome this limication we propose a DL model based on a customized Vision Transformer (ViT) architecture capable of analyzing images from both eyes when making an assessment.
Methods :
18886 retinal funduscopic images from the EYEPACS dataset were utilized for training. Images were grouped into 9443 image pairs consisting of OD and OS eye images from the same patient from the same visit. 9000 image pairs were used for trianing and 443 were used for testing. A base ViT model architecture was customized at the input channel level to support two image inputs at once, by generating two separate attention vectors for each input image and concatenating them inside the model. This custom dual-image ViT model was trained on 9000 image pairs to detect diabetic retinopathy (mild, moderate, or severe) and evaluated on the 443 image pairs. Single-image ResNet50 and ViT models were also trained on the same group of images to compare with the dual-image ViT.
Results :
The dual-image ViT achieved an AUC of 0.94 in detecting DR. The single-image ViT achieved an AUC 0.86, while the ResNet50 achieved na AUC of 0.74. Attention vectors from the dual-image model reveal diffuse attention throughout the retina, focusing on hallmark features of DR including microaneurysms and neovascularization. Notably, attention vectors in OD and OS eyes from the same patient were unique, indicating model attention focused on separate features in each eye
Conclusions :
The dual-image ViT outperformed single-image models by a large margin, and utilized hallmark features of DR in both eyes to make an assessment.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.