Images from 40 eyes of 20 control subjects and 80 eyes of 48 NPDR patients (18 mild, 16 moderate, and 14 severe NPDR) were used for this study. The database consisted of 40 control, 30 mild NPDR, 27 moderate NPDR, and 23 severe NPDR images. The detailed patient demographic data are shown in the
Table. There were no statistically significant differences between control and NPDR groups with respect to age, sex, or hypertension distribution (ANOVA,
P = 0.19; χ
2 test,
P = 0.24 and
P = 0.22, respectively).
The automated artery-vein classification in both fundus and OCTA images were validated using ground truths manually labeled by two retina specialists (JIL and DT). There were 96.21% and 93.97% agreements between the two observers on the identified artery-vein vessel maps for the fundus and OCTA images, respectively. This indicates that these 96.21% and 93.97% vessel areas can be used as the ground truths to validate the performance of automated artery-vein classification in fundus and OCTA images, respectively. As the artery-vein classification was performed on the extracted binarized vessel maps in fundus and OCTA images, the manual labeling was also prepared using the vessel maps so that there was consistency in the prepared ground truths and identified artery-vein maps. The smallest parafoveal capillaries in the OCTA images were not included in the vessel map and were also excluded from the manual labeling. For the fundus images, we used a matched filtering–based robust vessel-enhancing technique, which enabled robust segmentation of the vessel map with intricate details. We observed an average of 21% increase in the vasculature map in fundus images compared with the original one without vessel enhancement. For each of the ground truth images (both fundus and OCTA), the observers manually traced the whole binary vessel map with blue (for veins) and red (for arteries) markings and identified the branchpoints with yellow markings. Each of the manually identified nodes and artery-vein branches were matched pixel-wise with the classification results to measure the performance metrics. If the two graders had disagreement on specific areas of the vessel map, that area was labeled as unclassified and excluded from the ground truth. For evaluating the classification performance on all fundus and OCTA images (control and NPDR patients), sensitivity, specificity, and accuracy metrics were measured. The algorithm demonstrated 98.51% and 98.26% accuracies in identifying blood vessels as artery and vein, respectively, in the fundus images (ICC 0.98 and 0.96 for two-repeat measurement, 95% CI 0.91–1). There was 98.64% sensitivity and 96.13% specificity for artery identification and 98.36% sensitivity and 95.97% specificity for vein identification. For OCTA images, we observed 97.29% sensitivity and 96.57% specificity for artery identification and 97.38% sensitivity and 96.14% specificity for vein identification. The accuracies were 97.03% and 97.24%, respectively, for identifying blood vessels as artery and vein in the OCTA images (ICC 0.95 and 0.94 for two-repeat measurement, 95% CI 0.88–0.97). These performance metrics indicate that the automated classification performs really well for robust identification of arteries and veins, compared with manually labeled ground truths.
Quantitative analysis of control and NPDR OCTAs is summarized in
Figures 5A and
5B. The AVR-BVC and AVR-BVT features demonstrated excellent repeatability with two-repeat measurement. The corresponding ICC and 95% CIs for AVR-BVC and AVR-BVT are 0.97 (CI: 0.96–10) and 0.94 (CI: 0.89–0.94), respectively. We compare the sensitivity of AVR-BVC and AVR-BVT with m-BVC and m-BVT, respectively. We also verified if AVR-BVC and AVR-BVT improved the feature sensitivity for control versus NPDR OCTAs compared with a-BVC, a-BVT or v-BVC, v-BVT. For BVC analysis, although m-BVC increased slightly as NPDR stage progressed, it was not statistically significant. We observed 0.72%, 2.54%, and 4.04% increase for mild, moderate, and severe NPDR, compared with control data. For a-BVC and v-BVC analyses, 9.44%, 16.39%, and 24.59% decreases and 10.63%, 20.9%, and 31.95% increases were observed for control versus mild, control versus moderate, and control versus severe NPDR OCTAs, respectively. Because of the opposite polarity of a-BVC and v-BVC, the m-BVC change was not statistically significant. However, in case of AVR-BVC, the opposite polarities of a-BVC and v-BVC result in enhanced sensitivity in different NPDR stages. Compared with control OCTA data 18.14%, 30.9%, and 42.85% decreases were observed for mild, moderate, and severe NPDR OCTAs (Student's
t-test,
P < 0.001 for all three cases). The AVR-BVC change was also significant among the four groups (control and three NPDR groups; ANOVA,
P = 0.004). In contrast, a-BVC, v-BVC, or mean BVC differences were not significant among the four groups. AVR-BVC was the best feature to differentiate control from mild NPDR using OCTA (Student's
t-test,
P < 0.001), promising a unique biomarker for detecting early onset of NPDR in diabetes patients.
For BVT analysis, AVR-BVT improved the sensitivity, compared with m-BVT, but was not as significant as AVR-BVC. The a-BVT demonstrated minute changes between control and NPDR groups, but v-BVT increased as NPDR stage progressed. For v-BVT, 1.19%, 2.5%, and 5.1% increases were observed for control versus mild NPDR (not significant), control versus moderate NPDR (moderately significant, P < 0.05), and control versus severe NPDR (moderately significant, P < 0.05) OCTA. Intergroup change in v-BVT was also not statistically significant (ANOVA, P = 0.75). The m-BVT demonstrated limited change in control and NPDR groups (0.38%, 1.03%, and 2.39% increases in mild, moderate, and severe NPDR groups compared with control). For AVR-BVT, 1.6%, 3%, and 5.4% decreases were observed for control versus mild, control versus moderate, and control versus severe NPDR (moderately significant, P < 0.05 for all cases) eyes. It could distinguish between mild and severe NPDR (Student's t-test, P = 0.038). However, AVR-BVT could not differentiate mild and moderate NPDR groups with statistical significance (Student's t-test, P = 0.28). Intergroup change in AVR-BVT was also not statistical significant (ANOVA, P = 0.092).