Abstract
Purpose :
Identification of biomarkers in optical coherence tomography (OCT) scans is fundamental for detection, classification and monitoring of retinal pathologies. For human experts this is a difficult and time consuming task resulting in a high degree of variability.To cope with growing numbers of diagnostic scans, one viable solution is to automate this process. We hypothesize that a deep learning automated classification method is not inferior to a human annotator when grading scans on a b-scan level for a large number of biomarkers.
Methods :
OCT b-scans were collected from patients (N=429) with diabetic retinopathy (DFRP), diabetic macular edema (DME) and age-related macular degeneration (AMD). We annotated 11 (Healthy, Sub Retinal Fluid, Intra Retinal Fluid, Hyperreflective Foci, Drusen, Reticular Pseudodrusen, ERM, GA, Outer Retinal Atrophy, Intraretinal Cysts, Fibrovascular PED) biomarkers in a training set of N=21511 b-scans and a multi grader test set with N=1029 b-scans. The grader performance on the test set is evaluated using Cohen’s Kappa coefficient and the ground truth computed using a majority voting scheme. A multi label classifier is cross validated using the training set and an ensemble constructed for testing. Our classifier is based on a dilated residual convolutional neural network structure which retains a large spatial size throughout the network. Evaluation metrics of the automated method include Cohen’s Kappa, Mean Average Precision and F1 score.
Results :
Multi grader evaluation showed high variability in the inter-grader performance with Kappa in the range [0.371, 0.845] +/- [0.125, 0.065]. Kappa values from the graders to the majority vote are higher than the inter-grader (Kappa=[0.596, 0.903] +/- [0.125 ,0.066]). The mean Kappa value of scans annotated by all 8 graders (N=182) to the majority vote is 0.744 +/- 0.032. Our ensemble of 10 fold cross-validated classifiers achieves a mean Kappa of 0.753. Our methods mean average precision is 0.8733 and has mean F1 of 0.801, while runs at 35 b-scans/s in contrast to 24.2s / b-scan for expert annotator.
Conclusions :
We showed that our automated biomarker classification method is capable of outperforming human expert annotators in both accuracy and time. This opens the door to many possible clinical applications such as diagnostic recommendations for mass screening and retrospective study analysis
This abstract was presented at the 2019 ARVO Annual Meeting, held in Vancouver, Canada, April 28 - May 2, 2019.