Purchase this article with an account.
Mohammad Eslami, Miao Zhang, Julia Kim, Dolly Chang, Yangjiani Li, Saber Kazeminasab, Mojtaba Fazli, Vishal sharma, Michael Boland, Nazlee Zebardast, Mengyu Wang, Tobias Elze; Evaluation of Deep Learning Visual Field Prediction Models for Clinical Relevance. Invest. Ophthalmol. Vis. Sci. 2022;63(7):2012 – A0453.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Deep learning methods have recently been used for predicting future visual fields (VFs) using baseline or longitudinal VFs. In clinical practice, glaucomatous VF loss progression is a comparatively rare event. It is of particular clinical relevance if these prediction models can accurately identify patients with disease progression to aid clinicians in avoiding vision loss. Here, we evaluate two previously described models for potential biases in over- or underestimating VF changes over time.
We consider two recent studies, namely Wen et al. (MWen) to predict VF sensitivity and Park et al. (MPark) to predict total deviations (Fig. 1). All reliable (false negatives/positives≤30%, fixation losses≤30%) Humphrey 24-2 VFs from Mass. Eye and Ear glaucoma service from 1999 to 2020 were included. We re-implemented the methods and made them available to other investigators. As in the original studies, pointwise mean absolute error (PMAE) was used to measure model prediction accuracy. A 5-fold cross-validation scheme was utilized, and the models are additionally compared against a no-change model, i.e. the baseline VF for MWen and the last-observed VF for MPark were used as the predicted VF.
The evaluation dataset included 54,373 samples from 7,472 people for MWen and 24,430 samples from 1,809 people for MPark, depending on the method's needs. The PMAEs obtained by each method are shown in Fig. 1, and the results (%95 CI, MWen:2.21-2.24, MPark:2.56-2.61) are close to the original papers. Fig. 2 depicts 4 scatterplots w.r.t. mean of sensitivity for MWen and mean of deviation for MPark where A and B show predicted vs. truth, C and D show the prediction’s errors concerning actual changes. While both approaches produce satisfactory outcomes in Figs 2A and 2B, both methods exhibit a large error in projecting worsening cases, as seen in Figs 2C and 2D (the green dashed line shows a hypothetical unbiased model). It may also be deduced that the MPark that uses longitudinal VFs is superior to MWen that uses only baseline VFs.
Our evaluation of the two VF prediction models confirms the low PMAEs reported in the original studies. However, both models underpredicted worsening of VF loss over time. As detecting the progression of VF loss is a major motivation to obtain clinical VFs, we suggest explicitly considering this aspect in future model evaluations as well as the data characteristics.
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.
This PDF is available to Subscribers Only