Abstract
Purpose :
To assess and understand the transferability of convolutional neural networks (CNN) which distinguish glaucomatous from non-glaucomatous based upon OCT circumpapillary disc (circle) b-scan.
Methods :
771 circle b-scans (Fig top right) from 771 eyes (Dataset 1: DS1) were categorized as non-glaucoma (NG, n = 474) and glaucoma (G, n = 297) based on ratings of an OCT expert examining a commercial report (Fig left). 127 circle b-scans from a different OCT device of the same type at a different site and with a different operator (Dataset 2: DS2) were categorized as 75 NG and 52 G after 3 specialists evaluated all (OCT and visual fields) information (RS). Two CNN models (A & B) were independently trained, validated and tested on DS1 with a 60-20-20 ratio. Based on previous work,[1] CNN A’s hyper parameters were optimized according to validation results and CNN B’s backbone used ResNet50.[2] To assess transferability, we tested the same models on DS2. Heatmaps (Fig bottom right) were generated with a feature visualization method,[3] which highlighted heavily weighted regions, and were used for a post-hoc analysis of false positives (FPs) and false negatives (FNs) based on the RS. Ratings of an OCT expert on the DS2 b-scans provided a comparison of human vs model performance. To evaluate potential improvement in transferability, we re-ran the process (training-validation-testing on DS1, transfer on DS2) after including only circle b-scans of DS1 that the OCT expert rated as glaucomatous with reasonable confidence (<25% = NG, n = 444; >75 % = G, n = 259).
Results :
Compared to DS1, transfer accuracies were not as good with either model (96.1 & 95.5% vs. 70.1 & 76.4%), and both showed a number of FNs (Table). Training with the clearer glaucoma set slightly improved transfer accuracy (78.7 & 78.0% vs. 70.1 & 76.4%). With the same input, an OCT expert’s accuracy was no better than the models, although the distribution of FPs and FNs differed.
Conclusions :
Thus far, circular OCT scans alone are not sufficient for CNNs to optimally distinguish glaucomatous eyes from healthy eyes when tested on a new dataset. In general, it is important to evaluate the transferability of CNN models on new datasets, as it is likely performance will not be as good. 1. Thakoor et al., EMBC 2019; 2. He et al., arXiv:1512.03385 2015; 3. Zagoruyko et al., ICLR 2017
This is a 2020 ARVO Annual Meeting abstract.