Abstract
Purpose :
To evaluate the performance of Convolutional Neural Networks (CNNs) previously shown [1] to detect glaucoma from Optical Coherence Tomography (OCT) Retinal Nerve Fiber Layer (RNFL) probability maps, on a new dataset collected at a different location, by different operators, and on a different OCT instrument; the reference standard (RS), training, and input data were varied.
Methods :
The performances of 5 CNNs (previously trained to detect early glaucomatous damage from OCT RNFL probability maps, which achieved high accuracy (95%) [1]), were examined without any re-training on a new test set using 4 new reference standards (RS) for evaluation: an OCT expert’s gradings based on RS1: full OCT reports, RS2: only RNFL and RGCP (Retinal Ganglion Cell Plexiform) probability maps, RS3: only RNFL probability maps (format provided to CNNs [1]), and RS4: consensus of 3 graders who had access to OCT and visual field information. For the best-performing CNN, the impact on performance of data augmentation during training and varying input (only RNFL probability maps vs. RNFL and RGCP maps together) was assessed. False positive (FP) and false negative (FN) RNFL images were visualized with Grad-CAMs [2] and quantitatively assessed via abnormal structure & function (aS-aF) agreement. [3]
Results :
The ResNet-18 + Random Forest model with data augmentation and with RNFL probability map input alone was the best-performing model, achieving 83.0% accuracy when transferred to the new test set with RS1 and 81.1% accuracy with clinically-relevant RS4 (Table). aS-aF analysis of FP and FN indicated that number of aS-aF locations is significantly greater for true positives (TP) than for FN (p < 0.05) (Fig-lower panel). Regions highlighted in Grad-CAMs are also regions with aS-aF agreement (Fig-upper panels).
Conclusions :
When transferring to a new test set, choice of reference standard, data augmentation, and input image format can improve CNN performance. In this study, RNFL maps alone enabled better performance compared to RNFL and RGCP maps combined as CNN input. Providing the grader full OCT reports served as optimal transfer RS. S-F analysis indicated that CNNs miss cases (FNs) when there are significantly fewer aS-aF locations, suggesting that such CNNs could serve to screen RNFL images with extreme damage. 1. Thakoor et al., EMBC 2019; 2. Selvaraju et al., ICCV 2017; 3. Hood et al., IOVS 2019
This is a 2020 ARVO Annual Meeting abstract.