Abstract
Purpose :
The ability of clinicians to distinguish true glaucoma progression on spectral domain optical coherence tomography (SDOCT) from changes due to test-retest variability relies on the acquisition of precise measurements of the retinal nerve fiber layer (RNFL) thickness. However, commercially available automated segmentation software often fails to deliver reproducible and precise retinal layers boundaries, which can result in greater test variability. This study evaluated the long-term test-retest variability of RNFL estimates provided by a deep learning (DL) algorithm versus those provided by the conventional automated segmentation software in SDOCT.
Methods :
A convolutional neural network (CNN) was trained to predict RNFL thickness estimates from 22,715 SDOCT B-scan images acquired in 1,331 eyes of 701 subjects. Human graders had previously reviewed and labeled these images as free from automated segmentation artifacts. Only the raw B-scan images without the automated segmentation layer delineation were used to train+validate (80%) and test (20%) the DL algorithm. A longitudinal separate test dataset with B-scans from 409 eyes that were followed over time was then presented to the trained CNN, which output global RNFL thickness estimates. To assess the long-term test-retest variability, ordinary least squares linear (OLS) regression was used to fit the global RNFL thickness estimates over time that had been provided by the CNN and the RNFL thickness measurements from the automated segmentation software. The residuals from each OLS model were obtained by subtracting the predicted from the observed values and the standard deviation (SD) of the residuals were used as a measure of variability of the RNFL estimates from both DL and conventional algorithms.
Results :
The RNFL estimates of the DL algorithm had significantly lower long-term test-retest variability compared to the conventional automated segmentation software (mean SD of the residuals: 1.69±1.54mm vs. 2.33±3.15mm, respectively; P<0.01) in the longitudinal test sample.
Conclusions :
The DL model’s estimates of the RNFL showed significantly less variability than the measurements provided by commercially available automated segmentation software. Due to its lower test-retest variability, the DL model may improve detection of glaucoma progression.
This abstract was presented at the 2019 ARVO Annual Meeting, held in Vancouver, Canada, April 28 - May 2, 2019.