Purchase this article with an account.
Timothy Tin Long Yu, Julian Lo, Da Ma, Pengxiao Zang, Julia Owen, Ruikang K Wang, Aaron Y Lee, Yali Jia, Marinko V Sarunic; Collaborative Diabetic Retinopathy Severity Classification of Optical Coherence Tomography Data through Federated Learning. Invest. Ophthalmol. Vis. Sci. 2021;62(8):1029.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Diabetic retinopathy (DR) is among the most common microvascular complication of diabetes and can lead to sudden loss of vision. Exploration of deep learning for the classification of DR through optical coherence tomography angiography (OCT-A) is limited by the size of the labelled datasets. Data security is fundamental to, yet hinders, collaboration between institutions. We investigate the approach of sharing the deep neural network (DNN) model during training while keeping the images private using a method referred to as federated learning (FL). In this study, we investigate the relative performance of the cross-institution application of FL for the classification of referable DR (RDR) in OCT en face images.
This IRB approved study consisted of three independent institutions: SFU (n=403 subjects), OHSU (n=323 subjects), and UW (n=54 subjects) using three commercial OCT image acquisition systems. The input for classification consisted of a combination of the en face angiographic and structural images from the deep and superficial vascular complexes. SFU and OHSU allocated 60% for training, 20% for validation, and locked 20% for testing, while the UW data was reserved as an external test set. Transfer learning of a VGG19 architecture initialized with ImageNet weights were used for 4-fold cross-validated classification. Each participating institution trains the DNN model and uploads the weights to a central FL computer. The models are averaged and then redistributed to each of the participants. We compare the FL performance versus each individual institution (internally) trained and tested model, and across institutions (external).
There was no statistically significant (P<0.05) difference in classification observed through two-tailed t-tests between the FL and internal model across every metric for all datasets (representative results shown in Fig. 1). The FL framework achieved an accuracy of 0.762–0.880 with an associated F1 score of 0.677–0.909 and AUC of 0.910–0.979; the internal models attained performances of 0.809–0.908, 0.778–0.921, and 0.884–0.978, respectively.
The FL approach for RDR classification shows comparable performance to internal models. This study demonstrates potential for more generalizable networks through FL that incorporate learning on data from diverse domains.
This is a 2021 ARVO Annual Meeting abstract.
This PDF is available to Subscribers Only