Abstract
Purpose :
Deep convolutional neural networks (CNN) are widely used for glaucoma detection based on spatial features extracted from static fundus images (e.g., optic nerve cupping). However, glaucomatous damage is also associated with several dynamic markers expressed as temporal features within fundus videos (e.g., spontaneous venous pulsatility). In this study, we compared the performance of a combined CNN and recurrent neural network (RNN) trained on fundus videos (i.e., containing both spatial and temporal features), and a CNN trained on fundus images only (i.e., containing spatial features), in separating glaucoma from healthy eyes.
Methods :
Two deep neural network architectures including a pre-trained VGG16 (i.e., CNN) and a combined VGG16 and Long Short-Term Memory (LSTM) (i.e., CNN+RNN) were further trained on fundus images and videos collected from 695 participants (379 glaucoma [65.1 ± 13 yrs, 221 Male] and 316 controls [47 ± 15 yrs, 165 Male]). All participants had bilateral dilated funduscopy and a minimum 5-second recording of the retinal vasculature (30 fps) centred on the optic disc. All subjects underwent standard automated perimetry (Humphrey 24-2 SITA-Standard) and optical coherence tomography assessments. In all cases, a glaucoma specialist was responsible for diagnosis of glaucoma and participants classified as ophthalmically healthy. Network training and evaluation was performed on an 85% (training), 10% (test), and 5% (validation) split. The F-measure accuracy, sensitivity and specificity of both models were evaluated.
Results :
The combined VGG16 and LSTM reached an F-measure, sensitivity and specificity of 96.2±1.7%, 0.89±0.04, and 0.97±0.02. These measures for the VGG16 model were 70.3±5%, 0.59±0.1, and 0.65±0.2, respectively. We observed a significant difference in F-measure (p<0.0001), sensitivity (p<0.0001) and specificity (p<0.0001) between the two models .
Conclusions :
This study demonstrates that training a combined VGG16 and LSTM on fundus videos outperforms a VGG16 trained on fundus images in separating glaucoma from healthy eyes. This suggests that glaucoma should be treated as a video classification task, as the combined model not only takes into account the spatial features in a fundus image but also the temporal features embedded in a fundus video. Further evaluation on a larger, heterogeneous population is required to validate this approach.
This is a 2020 Imaging in the Eye Conference abstract.