September 2015
Volume 56, Issue 10
Free
Glaucoma  |   September 2015
More Accurate Modeling of Visual Field Progression in Glaucoma: ANSWERS
Author Affiliations & Notes
  • Haogang Zhu
    School of Health Sciences City University London, London, United Kingdom
    National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital National Health Service Foundation Trust and UCL Institute of Ophthalmology, London, United Kingdom
  • David P. Crabb
    School of Health Sciences City University London, London, United Kingdom
  • Tuan Ho
    National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital National Health Service Foundation Trust and UCL Institute of Ophthalmology, London, United Kingdom
  • David F. Garway-Heath
    National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital National Health Service Foundation Trust and UCL Institute of Ophthalmology, London, United Kingdom
  • Correspondence: Haogang Zhu, School of Health Sciences, City University London, Northampton Square, London, EC1V 0HB UK; haogangzhu@gmail.com
Investigative Ophthalmology & Visual Science September 2015, Vol.56, 6077-6083. doi:10.1167/iovs.15-16957
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Haogang Zhu, David P. Crabb, Tuan Ho, David F. Garway-Heath; More Accurate Modeling of Visual Field Progression in Glaucoma: ANSWERS. Invest. Ophthalmol. Vis. Sci. 2015;56(10):6077-6083. doi: 10.1167/iovs.15-16957.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: To validate a method for visual field (VF) progression analysis, called ANSWERS (Analysis with Non-Stationary Weibull Error Regression and Spatial Enhancement), which takes into account increasing measurement variability as glaucoma progresses and spatial correlation among test locations.

Methods: ANSWERS outputs both a global index of progression and a pointwise estimate of rate of change at each VF location. ANSWERS was compared with linear regression of mean deviation (MD) and permutation of pointwise linear regression (PoPLR). Visual field series of up to 2 years from the United Kingdom Glaucoma Treatment Study were used. This consists of 9104 Swedish Interactive Thresholding Algorithm Standard 24-2 VFs. ANSWERS and PoPLR rate of change were used to predict the VF at the next visit using subseries that were within 7, 13, 18, or 22 months from the baseline. The comparison was carried out on the statistical sensitivity, specificity, and accuracy of predicting future VF.

Results: Across all subseries, statistical sensitivity of ANSWERS in detecting VF deterioration was significantly better than the linear regression of MD and PoPLR, especially in short time series. Prediction accuracy of ANSWERS was better than PoPLR at all series lengths, and the improvement was particularly marked in shorter series. Seventy-five percent of VF series were better predicted by ANSWERS compared with PoPLR. The average prediction error of ANSWERS was 15% lower than that of PoPLR.

Conclusions: ANSWERS is more sensitive to detect VF progression and predicts future VF loss better than linear regression of MD and PoPLR, especially over short observation periods. (http://www.isrctn.com number, ISRCTN96423140.)

Management of glaucoma relies on visual field (VF) measurement using standard automated perimetry (SAP), which assesses differential light sensitivity (DLS) across a subject's field of view.1 Accurate and precise assessment of VF change over time is essential for appropriate clinical management of glaucoma so that patients whose condition is worsening receive prompt treatment intervention while those with a stable condition are not overtreated. Currently, however, VF measurement is highly imprecise and has complex statistical properties, which make monitoring changes in VF challenging. 
Clinical evaluation of glaucomatous change in VF series can be supported by analytical algorithms. Probability of change and rate (velocity) of change can be derived from these algorithms. These two parameters can be estimated with methods known as trend analyses. Pointwise linear regression (PLR), the most widely used trend analysis, fits an ordinary linear regression model for each location in the VF and assesses the significance and slope of the fit.2 Summary measures, such as mean deviation (MD) from the average DLS of healthy eyes, are also often used in trend analysis; but, since glaucoma tends not to affect all locations to the same extent, global indices often have inadequate statistical sensitivity to detect worsening compared with methods assessing deterioration at individual locations.3 Permutation analyses of pointwise linear regression (PoPLR),4 a recent advance in PLR trend analysis, involves a random permutation of the order of VFs in a series. It has been reported to provide a better estimate of overall statistical significance of change compared with PLR.5 The significance of change in PoPLR is estimated in the context of permutated series of VFs, which assumes no change in reordered series. As there is a need for permutation in VF series, this technique cannot estimate reliably the significance of change in series with fewer than six VFs because of the limited number of possible permutations in such series. Moreover, despite a different method to estimate the overall statistical significance by PoPLR, the underlying regression model is still that of ordinary linear regression, and therefore the estimate of rate of change and the statistical significance of change at individual locations are identical to those of PLR. 
Two important properties in VF measurement are not accounted for in the current methods for detecting change in VF series: nonstationary variability (increasing variability as DLS declines) and spatial correlation among test locations. Visual field measurements are subject to considerable variability, which increases as DLS deteriorates with disease progression, and eventually decreases in blind regions.68 For instance, the repeat measurement range (90% confidence interval [CI]) is 7 dB (26–33 dB) when DLS is healthy at 32 dB, while this range increases to 18 dB (5–27 dB) when DLS deteriorates to 20 dB.9 This changing variability over time is referred to as nonstationary measurement variability. Furthermore, the traversing of the VF test grid by retinal nerve fibers results in correlation between spatially related locations.10 The most widely used SAP VF measurements, such as those taken by Humphrey Visual Field Analyzer (HFA; Carl Zeiss Meditec, Dublin, CA, USA), are made in a regular grid across a patient's field of view. Aside from the neighborhood of test locations, the spatial correlation is also governed by the anatomical arrangement of the retinal nerve fibers.11 Betz-Stablein et al.12 incorporated such spatial correlation in six regions of VF corresponding to the six sectors of the optic disc and demonstrated improved performance in detecting VF progression. Therefore, without taking into account these statistical properties, the detection of change in VF with current methods is potentially delayed or requires more clinic visits than necessary.13 
A new trend analysis method, Analysis with Non-Stationary Weibull Error Regression and Spatial Enhancement (ANSWERS), was proposed and validated on a large dataset acquired from electronic health records.9 In contrast to commonly used ordinary linear regression models, which assume fixed and normally distributed errors, ANSWERS incorporates the nonstationary variability at different levels of DLS modeled as mixtures of Weibull distributions. Spatial correlation of measurements was also included in the model using a Bayesian framework. Despite its optimized statistical attributes, ANSWERS still acts as a linear regression model that outputs both the probability of no deterioration and rate of change at individual locations in the VF and can be interpreted in the same way as PLR. It also produces a global deterioration index summarizing the overall probability of change in the series. The details about derivation and implementation of ANSWERS can be found elsewhere.9 
This study compared ANSWERS with PoPLR and the linear regression of MD on a dataset from a clinical trial. The assumption was that a more accurate method for modeling VF series should (1) be more sensitive in detecting change under the same specificity and (2) predict future VFs more accurately. 
Methods
The methods under comparison included ANSWERS, PoPLR, and linear regression of MD. Additionally, in order to investigate the effect of incorporating spatial correlation in ANSWERS, its effect can be switched off and the method is referred to as ANSWER.9 
Datasets
All VFs were measured with the HFA (Carl Zeiss Meditec) using the 24-2 test pattern and the SITA (Swedish Interactive Thresholding Algorithm) Standard testing algorithm. The test measures retinal DLS at 52 test locations excluding two points in the blind spot region. 
Two datasets collected at different centers were used in this study. The first dataset contains VF series from the United Kingdom Glaucoma Treatment Study (UKGTS),14,15 a randomized, double-masked placebo-controlled clinical trial testing the hypothesis that treatment with a topical prostaglandin analogue, compared with placebo, reduces the frequency of VF deterioration events. Patients were followed up for 2 years or until reaching the endpoint criteria. During the 2-year period, patients were tested at 2, 4, 7, 10, 13, 16, 18, 20, 22, and 24 months from baseline, and two repeated VF tests were taken at baseline and at 2, 16, 18, and 24 months from baseline. Details of the dataset have been described elsewhere.14 Visual field tests with false-positive reliability responses over 15% were discarded. Only series that were obtained over at least 4 months (three visits) were included in the analysis. Note that the length of series is purely for evaluation purposes and is not necessitated by ANSWERS. The resulting dataset consisted of 9104 VF tests from 659 series of 437 patients. The median (interquartile range [IQR]) time of follow-up was 22 (15–24) months, and the median (IQR) number of VFs in the series was 11 (6–12). 
The second dataset was from a study examining the test–retest variability of VF test conducted at Moorfields Eye Hospital, London, United Kingdom, in a cohort of glaucoma patients. As changes in retinal function are slow in glaucoma, it is possible to estimate test measurement variability by taking repeat measurements in a short period of time under the assumption that no measurable deterioration can occur over the observation period.6 Fifty-two eyes of 27 patients were tested 10 times over a short period (maximum 10 weeks). The variance among VFs in these repeat measures indicates the inherent measurement variability. Furthermore, the VF series for each eye, and the same series with arbitrary reordering, represent a stable series with no underlying change. The use of randomly reordered series for the estimate of measurement variability is an established method in various studies.16,17 
Patients' data were anonymized prior to investigation and did not contain personal or sensitive information. The data were held in a secure database at City University London. As such, patients' written consent for their data to be used in the study was not required. The study adhered to the tenets of the Declaration of Helsinki and was approved by the research governance committee of City University London, United Kingdom. The anonymized dataset can be accessed upon request. 
False-Positive Rate and Statistical Sensitivity for Change Detection
The false-positive rate and statistical sensitivity were compared for the four trend analysis methods. A false positive is a type I error when change is detected in a series with no true progression. The false-positive rate can be estimated in the series of repeated measurements acquired in a short period of time. Moreover, randomly reordering these repeated measurements produces additional pseudo-series where there is also no true deterioration. 
The series of 10 VFs from each eye in the test–retest dataset were randomly reordered 300 times. With 52 eyes in the test–retest data, 124,800 (52 × 300 × 8) pseudo-series with eight different series lengths of between 3 and 10 VFs were generated. It was assumed that five VFs per year were taken in these pseudo-series (the median test frequency in the UKGTS dataset). The false-positive rate was then estimated as the proportion of series identified as progressing. In a clinical situation, false positives may lead to overtreatment and unnecessary cost, so methods with high false-positive rates are considered not to be clinically useful. 
Comparison of the different methods should be made at equivalent false-positive rates, which is dependent on the chosen change criterion and the length of the series. For PoPLR, the deterioration criterion was an overall statistical significance of change4 smaller than a given threshold. For ANSWERS and ANSWER, the criterion was a deterioration index9 higher than a given threshold. For linear regression of MD, the criteria were a negative slope and slope P value lower than a set threshold. For each method, a set of thresholds was chosen to achieve specified false-positive rates, and the statistical sensitivity of each method was then compared at equivalent false-positive rates. 
Statistical sensitivity is a measure of identifying true change. Ideally, the sensitivity should be evaluated as the proportion of detected progression in VF series with true underlying deterioration. However, due to the lack of a gold standard and ground truth classification for glaucomatous deterioration,18 the underlying progression status of each VF series was unknown. Therefore, the methods were compared using the positive rate, which is the proportion of series flagged as progressing in the UKGTS dataset. Given an unknown proportion (p%) of truly progressing series in the dataset, the positive rate was linked to statistical sensitivity as positive rate = (p% × sensitivity) + [(1 − p%) × false-positive rate]. Note that if the false-positive rate is controlled to be equivalent for all the methods, a higher positive rate implies better sensitivity of a method. Therefore, the positive rates of all the methods were compared as a surrogate comparison for statistical sensitivity. Moreover, when the false-positive rate is low, the positive rate is dominated by the sensitivity. Therefore when comparing two methods at lower false-positive rate, the ratio of positive rate between the methods is closer to the ratio of sensitivity. The comparison was made with series of 7, 13, 18, and 22 months from baseline. 
Prediction of Future Visual Field
One important clinical question regarding modeling of VF progression is the projection of future VF loss, which is closely related to the rate of change in VF series. It was assumed that better modeling of progression would lead to a more accurate estimate for the rate of change and hence a better prediction of future VF. The comparison was carried out for ANSWERS, ANSWER, and PoPLR using the raw DLS measurements at all locations. Note that the rates of change and predictions from PoPLR are exactly those from ordinary linear regression at individual VF locations. 
Subseries of the UKGTS data were used to predict the VF at the next visit, with the shortest subseries including VFs at the first three visits (4 months) from baseline. The subseries increased in length to include more visits in a chronologically ascending order. For each subseries of the same length, the three methods were used to estimate the rates of change at individual locations, which were then used to predict the VF in the subsequent visit. The prediction performance was evaluated as mean normalized squared error (MNSE) between the predicted and measured VFs across 52 locations:  The MNSE is the average squared prediction error Image not available in percentage with regard to the measurement variance Image not available at corresponding locations in the UKGTS dataset. It quantifies the importance of prediction error in the context of population measurement variability, avoiding an overestimation of error in more variable measurements or an underestimation of error in less variable measurements. For instance, 1-dB error in central areas of the VF is more significant than 1-dB error in peripheral vision because the measurement variance in central vision is smaller than that in peripheral vision. The comparison of the average rate of change (mean slope across VF locations) and prediction performance was made with all subseries and with subseries of 7, 13, 18, and 22 months, respectively.  
The performance in VF prediction of the three methods was also investigated with respect to the amount of change that actually occurred, estimated as mean difference between VFs being predicted and the baseline VF. All analysis was carried out using MATLAB R2013a (MathWorks, Inc., Natick, MA, USA). 
Results
The median (IQR) of MD in the VF data from UKGTS and test–retest dataset are −1.88 (−3.83 to −0.57) and −3.08 (−8.36 to −0.84) dB, respectively. 
False-Positive Rate and Statistical Sensitivity for Change Detection
The false-positive rate for any particular change criterion depends on the length of the VF series (each additional test in a series is an opportunity for a false-positive). For instance, the ANSWERS criterion for various false-positive rates and various lengths of series (number of VFs in the series) is plotted in Figure 1. The threshold was fitted as a modulated sigmoidal function9 of the false-positive rate and the series length, and the fitting has an r2 > 0.99. This allows the threshold to be calculated for the series longer than 10 that is the maximum length of the series in the test–retest data. 
Figure 1
 
ANSWERS change criterion at various false-positive rates and lengths of series (number of fields in the series). The ANSWERS threshold was estimated with false-positive rates between 2% and 10% and length of series between 4 and 10. Each curve represents the ANSWERS threshold for the false-positive rate indicated at the end of the curve. The thresholds with series longer than 10 (the part on the right side of the vertical dashed line) are extrapolated.
Figure 1
 
ANSWERS change criterion at various false-positive rates and lengths of series (number of fields in the series). The ANSWERS threshold was estimated with false-positive rates between 2% and 10% and length of series between 4 and 10. Each curve represents the ANSWERS threshold for the false-positive rate indicated at the end of the curve. The thresholds with series longer than 10 (the part on the right side of the vertical dashed line) are extrapolated.
Figure 2 shows the positive rates for series of 7, 13, 18, and 22 months from baseline. Only positive rates estimated for false-positive rates between 0% and 15% (specificity 85%–100%) are displayed. Table 1 summarizes the areas under the partial positive rate curves for the different methods (Fig. 2). Because the total area with false-positive rates between 0% and 15% was 0.15, the areas under the partial positive rate curves were normalized by being divided by 0.15. 
Figure 2
 
Positive rates of ANSWERS, ANSWER, PoPLR, and linear regression of MD in VF subseries at 7, 13, 18, and 22 months from baseline. The positive rates are estimated at false-positive rates between 0% and 15%.
Figure 2
 
Positive rates of ANSWERS, ANSWER, PoPLR, and linear regression of MD in VF subseries at 7, 13, 18, and 22 months from baseline. The positive rates are estimated at false-positive rates between 0% and 15%.
Table 1
 
Normalized Areas Under Partial Positive Rate Curves for ANSWERS, ANSWER, PoPLR, and Linear Regression of MD
Table 1
 
Normalized Areas Under Partial Positive Rate Curves for ANSWERS, ANSWER, PoPLR, and Linear Regression of MD
The trend analysis methods were also compared at the 5% false-positive rate. The ratios of positive rates between pairs of methods are shown in Table 2, where a ratio > 1 indicates a better positive rate. For instance, with subseries of 7 months, the ratio of ANSWERS over PoPLR was 1.71, indicating that the positive rate of ANSWERS is 1.71 times that of PoPLR. 
Table 2
 
Ratio of the Positive Rates (at 5% False-Positive Rate) for ANSWERS and ANSWER Over Those of ANSWER, PoPLR, and Linear Regression of MD
Table 2
 
Ratio of the Positive Rates (at 5% False-Positive Rate) for ANSWERS and ANSWER Over Those of ANSWER, PoPLR, and Linear Regression of MD
In all subseries, the positive rates of ANSWERS and ANSWER were higher than those of PoPLR and linear regression of MD. Improvement was even greater in short subseries. The spatial enhancement included in ANSWERS also increased the positive rate compared with ANSWER, especially in short subseries. However, this improvement became marginal as the length of the subseries increased. 
Estimate of Rate of Change
In all subseries, the average rate of change (median [IQR]) across all VF locations estimated by ANSWERS, ANSWER, and PoPLR was 0.12 (−0.44 to 0.67), 0.12 (−0.41 to 0.65), and 0.16 (−0.73 to 0.98) dB/year, respectively. The (unsigned) magnitude (median [IQR]) of average rate of change is 0.55 (0.26–1.07), 0.53 (0.25–0.99), and 0.87 (0.40–1.70) dB/year for ANSWERS, ANSWER, and PoPLR. Both ANSWERS and ANSWER made significantly (P < 0.01, Wilcoxon signed rank test) smaller estimates of the magnitude of the rate of change compared with PoPLR. ANSWERS provided a statistically significant (P < 0.01%, Wilcoxon signed rank test) greater rate magnitude compared with ANSWER. The comparison of rate of change between ANSWERS, ANSWER, and PoPLR in subseries of 7, 13, 18, and 22 months is presented in Figures 3 and 4, in which the relative relationship between the magnitude of the rate of change from the three methods (PoPLR > ANSWERS > ANSWER, P < 0.01% in Wilcoxon signed rank test) was consistent in all subseries, except that for 22 months, where the rate magnitude did not differ between ANSWERS and ANSWER (P = 20%, Wilcoxon signed rank test). The results can be seen in Figure 3, where the points scatter around a line with a slope of less than 1, and in Figure 4, where the points scatter around a line with slope of more than 1, except for those at 22 months. For 13 and 18 months, although the difference between ANSWERS and ANSWER is statistically different, the amount of the difference is minimal so the points scatter closely to the diagonal line in Figure 4
Figure 3
 
Scatterplot of average rate of change from ANSWERS against that from PoPLR in VF subseries at 7, 13, 18, and 22 months from baseline. Semitransparent points are used to relieve overlapping in the plot. The straight dashed line represents a reference line with a slope of 1.
Figure 3
 
Scatterplot of average rate of change from ANSWERS against that from PoPLR in VF subseries at 7, 13, 18, and 22 months from baseline. Semitransparent points are used to relieve overlapping in the plot. The straight dashed line represents a reference line with a slope of 1.
Figure 4
 
Scatterplot of average rate of change from ANSWERS against that from ANSWER in VF subseries at 7, 13, 18, and 22 months from baseline. Semitransparent points are used to relieve overlapping in the plot. The straight dashed line represents a reference line with a slope of 1.
Figure 4
 
Scatterplot of average rate of change from ANSWERS against that from ANSWER in VF subseries at 7, 13, 18, and 22 months from baseline. Semitransparent points are used to relieve overlapping in the plot. The straight dashed line represents a reference line with a slope of 1.
Prediction of Future Visual Field
In all subseries of VFs, the MNSE (median [IQR]) for ANSWERS, ANSWER, and PoPLR was 54% (33%–113%), 60% (36%–122%), and 76% (43%–146%), respectively. ANSWERS provided better prediction (lower MNSE) than PoPLR and ANSWER in 75% and 71% of VFs, respectively. The MNSE from ANSWERS was significantly smaller (P = 0.01%, Wilcoxon signed rank test) than those from PoPLR and ANSWER (median [95% CI] difference: 15% [10%–19%] and 2% [1%–4%], respectively). The comparison between the three methods for prediction of VFs at 10, 16, 20, and 24 months using subseries of 7, 13, 18 and 22 months is summarized in Table 3. ANSWERS outperformed PoPLR in all subseries. The improvement was greater in shorter subseries. The spatial enhancement in ANSWERS made it a better predictor than ANSWER for all VF predictions except for those at the 24th month. 
Table 3
 
Median (Interquartile Range) Mean Normalized Squared Error (MNSE) of ANSWERS, ANSWER, and PoPLR for Prediction of VFs at 10, 16, 20, and 24 Months From Baseline
Table 3
 
Median (Interquartile Range) Mean Normalized Squared Error (MNSE) of ANSWERS, ANSWER, and PoPLR for Prediction of VFs at 10, 16, 20, and 24 Months From Baseline
Figure 5a shows the improvement in VF prediction by ANSWERS compared with PoPLR, where PoPLR MNSE minus ANSWERS MNSD was plotted against the amount of change (average difference between the VF being predicted and baseline VF). Seventy-five percent of VFs (above dashed line) were better predicted by ANSWERS than by PoPLR. Table 4 shows the median (95% CI) improvement of ANSWERS, compared with PoPLR and ANSWER, when predicted VFs were between −5 and 5 dB different from baseline. ANSWERS provided better prediction irrespective of the amount of change from baseline. Therefore, compared with PoPLR, although ANSWERS produced smaller magnitude of slopes (slower rate of change), it did not make larger prediction errors in faster-progressing eyes by generally flattening the slope. Moreover, the spatial correlation used in ANSWERS made it a better predictor for future VF compared with ANSWER, regardless of the difference from baseline (Fig. 5b). 
Figure 5
 
Improvement of ANSWERS over (a) PoPLR and (b) ANSWER, stratified by the amount of change from baseline determined as the average difference between the VF being predicted and baseline VF. Semitransparent points are used to relieve overlapping. Positive values (above dashed lines) on the y-axis indicate a better performance by ANSWERS.
Figure 5
 
Improvement of ANSWERS over (a) PoPLR and (b) ANSWER, stratified by the amount of change from baseline determined as the average difference between the VF being predicted and baseline VF. Semitransparent points are used to relieve overlapping. Positive values (above dashed lines) on the y-axis indicate a better performance by ANSWERS.
Table 4
 
Median (95% Confidence Interval) Improvement of ANSWERS Over PoPLR and ANSWER in Prediction at Various Amounts of Change in VF From Baseline
Table 4
 
Median (95% Confidence Interval) Improvement of ANSWERS Over PoPLR and ANSWER in Prediction at Various Amounts of Change in VF From Baseline
Discussion
ANSWERS is a more sensitive method to detect VF progression, and is more accurate in predicting future VFs, compared to the other trend-based methods evaluated. At equivalent false-positive rates, it detected a greater number of eyes with change compared with PoPLR and linear regression of MD. In addition, the results indicate that future VFs can be better predicted by ANSWERS than the other methods. The Weibull mixture retest distribution, compared with a normally distributed error in ordinary regression models, captures the changing variability in the measurement and leads to the significant improvement. In addition, the spatial enhancement gathers information from spatially related locations in the VF, adding or reducing weight to the observed progression at a location, thus improving further this method, especially for short time series. In clinical situations, where follow-up testing is infrequent,19 often due to limited resources, the usefulness of ANSWERS in short series is of particular interest. 
The MD of UKGTS data (median of −1.88 dB) is better than that of the test–retest data (median of −3.08 dB), so the test–retest data potentially likely have higher measurement variability than the UKGTS data due to the increasing variability with decreasing DLS. Therefore, the false-positive rate for the test–retest data may be higher than that of the UKGTS data. This potential overestimation of false-positive rate affects all methods under evaluation equivalently, so the comparison among them is fair. 
In this study, test–retest data were used to estimate measurement variability and false-positive rates. These data were acquired within a very short period of time (10 VFs in less than 10 weeks), and it is highly unlikely that measurable damage occurred in this period. However, the patients from whom this dataset was derived may have gained psychophysical experience more quickly than those in the clinic who undertake perimetric tests much less frequently, and therefore measurements obtained from these patients could have lower variability compared with those obtained in clinical practice. However, the methods evaluated in this study were compared using the common ground of the same test–retest data. If the test–retest dataset does have lower variability than the trial dataset, with specificity overestimated, the positive rate may be slightly overestimated. Notably, the prediction of future VF acted as a separate validation of the statistical methods, independent of the test–retest data. 
All the trend analysis methods compared in this study assumed a linear change in the VF subseries.20 This is because there are insufficient data to identify nonlinear change, should it exist, owing to the relatively short VF series acquired in clinical practice.13,19 A recent study indicated that change in VF series may follow a nonlinear trend such as an exponential function.21 It is, however, simple to configure ANSWERS to model nonlinear change9 in long VF series. Moreover, PoPLR was used to determine criteria for progression in PLR; however, other criteria defined on the combinations of slope and statistical significance are possible.22 
It is important to note that a perfect prediction of future VF cannot be achieved currently, owing to the variability in measurements; the VF being predicted itself includes measurement error. The performance of statistical methods is thus limited by data acquisition techniques. 
In conclusion, ANSWERS provides an analytical tool in a “landscape of uncertainty” in modeling VF progression. This new technique has the potential to help improve clinical management decisions. In addition, it can be used to help define better and more relevant endpoints for clinical trials, which could help increase the efficiency of trials and decrease their duration and cost. 
Acknowledgments
Supported by the National Institute for Health Research, National Health Service, United Kingdom, which gave rise to a Research Fellow Award and Health Technology Assessment Grant supporting this independent research. The views expressed in this publication are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research, or the Department of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 
Disclosure: H. Zhu, P; D.P. Crabb, P; T. Ho, None; D.F. Garway-Heath, P 
References
Henson DB. Visual Fields. 2nd ed. Oxford: Butterworth-Heinemann; 2000.
Viswanathan AC, Fitzke FW, Hitchings RA. Early detection of visual field progression in glaucoma: a comparison of PROGRESSOR and STATPAC 2. Br J Ophthalmol. 1997; 81: 1037–1042.
Vesti E, Johnson CA, Chauhan BC. Comparison of different methods for detecting glaucomatous visual field. Invest Ophthalmol Vis Sci. 2003; 44: 3873–3879.
O'Leary N, Chauhan BC, Artes PH. Visual field progression in glaucoma: estimating the overall significance of deterioration with permutation analyses of pointwise linear regression (PoPLR). Invest Ophthalmol Vis Sci. 2012; 53: 6776–6784.
Redmond T, Leary NO, Hutchison DM, Nicolela MT, Artes PH, Chauhan BC. Visual field progression with frequency-doubling matrix perimetry and standard. JAMA Ophthalmol. 2013; 131: 1565–1572.
Artes PH, Iwase A, Ohno Y, Kitazawa Y, Chauhan BC. Properties of perimetric threshold estimates from Full Threshold SITA Standard, and SITA Fast strategies. Invest Ophthalmol Vis Sci. 2002; 43: 2654–2659.
Henson DB, Chaudry S, Artes PH, Faragher EB, Ansons A. Response variability in the visual field: comparison of optic neuritis, glaucoma, ocular hypertension, and normal eyes. Invest Ophthalmol Vis Sci. 2000; 41: 417–421.
Russell RA, Crabb DP, Malik R, Garway-Heath DF. The relationship between variability and sensitivity in large-scale longitudinal visual field data. Invest Ophthalmol Vis Sci. 2012; 53: 5985–5990.
Zhu H, Russell RA, Saunders LJ, Ceccon S, Garway-Heath DF, Crabb DP. Detecting changes in retinal function: analysis with non-stationary Weibull error regression and spatial enhancement (ANSWERS). PLoS One. 2014; 9: e85654.
Pascual J, Schiefer U, Paetzold J, et al. Spatial characteristics of visual field progression determined by Monte Carlo. Invest Ophthalmol Vis Sci. 2007; 48: 1642–1650.
Garway-Heath DF, Poinoosawmy D, Fitzke FW, Hitchings RA. Mapping the visual field to the optic disc in normal tension glaucoma eyes. Ophthalmology. 2000; 107: 1809–1815.
Betz-Stablein BD, Morgan WH, House PH, Hazelton ML. Spatial modeling of visual field data for assessing glaucoma progression. Invest Ophthalmol Vis Sci. 2013; 54: 1544–1553.
Chauhan BC, Garway-Heath DF, Goñi FJ, et al. Practical recommendations for measuring rates of visual field change in glaucoma. Br J Ophthalmol. 2008; 92: 569–573.
Garway-Heath DF, Lascaratos G, Bunce C, Crabb DP, Russell RA, Shah A. The United Kingdom Glaucoma Treatment Study: a multicenter randomized, placebo-controlled clinical trial: design and methodology. Ophthalmology. 2013; 120: 68–76.
Garway-Heath DF, Crabb DP, Bunce C, et al. Latanoprost for open-angle glaucoma (UKGTS): a randomised, multicentre, placebo-controlled trial. Lancet. 2014; 385: 1295–1304.
Patterson AJ, Garway-Heath DF, Strouthidis NG, Crabb DP. A new statistical approach for quantifying change in series of retinal and optic nerve head topography images. Invest Ophthalmol Vis Sci. 2005; 46: 1659–1667.
Frackowiak RSJ. Human Brain Function. San Diego: Academic Press; 1997.
Gardiner SK, Crabb DP. Examination of different pointwise linear regression methods for determining visual field progression. Invest Ophthalmol Vis Sci. 2002; 43: 1400–1407.
Fung SSM, Lemer C, Russell RA, Malik R, Crabb DP. Are practical recommendations practiced? A national multi-centre cross-sectional study on frequency of visual field testing in glaucoma. Br J Ophthalmol. 2013; 97: 843–847.
Bryan SR, Vermeer KA, Eilers PHC, Lemij HG, Lesaffre EM. Robust and censored modeling and prediction of progression in glaucomatous visual fields. Invest Ophthalmol Vis Sci. 2013; 54: 6694–6700.
Pathak M, Demirel S, Gardiner SK. Nonlinear multilevel mixed-effects approach for modeling longitudinal standard automated perimetry data in glaucoma. Invest Ophthalmol Vis Sci. 2013; 54: 5505–5513.
Kummet C, Zamba K, Doyle C, Johnson C, Wall M. Refinement of pointwise linear regression criteria for determining glaucoma. Invest Ophthalmol Vis Sci. 2013; 54: 6234–6241.
Figure 1
 
ANSWERS change criterion at various false-positive rates and lengths of series (number of fields in the series). The ANSWERS threshold was estimated with false-positive rates between 2% and 10% and length of series between 4 and 10. Each curve represents the ANSWERS threshold for the false-positive rate indicated at the end of the curve. The thresholds with series longer than 10 (the part on the right side of the vertical dashed line) are extrapolated.
Figure 1
 
ANSWERS change criterion at various false-positive rates and lengths of series (number of fields in the series). The ANSWERS threshold was estimated with false-positive rates between 2% and 10% and length of series between 4 and 10. Each curve represents the ANSWERS threshold for the false-positive rate indicated at the end of the curve. The thresholds with series longer than 10 (the part on the right side of the vertical dashed line) are extrapolated.
Figure 2
 
Positive rates of ANSWERS, ANSWER, PoPLR, and linear regression of MD in VF subseries at 7, 13, 18, and 22 months from baseline. The positive rates are estimated at false-positive rates between 0% and 15%.
Figure 2
 
Positive rates of ANSWERS, ANSWER, PoPLR, and linear regression of MD in VF subseries at 7, 13, 18, and 22 months from baseline. The positive rates are estimated at false-positive rates between 0% and 15%.
Figure 3
 
Scatterplot of average rate of change from ANSWERS against that from PoPLR in VF subseries at 7, 13, 18, and 22 months from baseline. Semitransparent points are used to relieve overlapping in the plot. The straight dashed line represents a reference line with a slope of 1.
Figure 3
 
Scatterplot of average rate of change from ANSWERS against that from PoPLR in VF subseries at 7, 13, 18, and 22 months from baseline. Semitransparent points are used to relieve overlapping in the plot. The straight dashed line represents a reference line with a slope of 1.
Figure 4
 
Scatterplot of average rate of change from ANSWERS against that from ANSWER in VF subseries at 7, 13, 18, and 22 months from baseline. Semitransparent points are used to relieve overlapping in the plot. The straight dashed line represents a reference line with a slope of 1.
Figure 4
 
Scatterplot of average rate of change from ANSWERS against that from ANSWER in VF subseries at 7, 13, 18, and 22 months from baseline. Semitransparent points are used to relieve overlapping in the plot. The straight dashed line represents a reference line with a slope of 1.
Figure 5
 
Improvement of ANSWERS over (a) PoPLR and (b) ANSWER, stratified by the amount of change from baseline determined as the average difference between the VF being predicted and baseline VF. Semitransparent points are used to relieve overlapping. Positive values (above dashed lines) on the y-axis indicate a better performance by ANSWERS.
Figure 5
 
Improvement of ANSWERS over (a) PoPLR and (b) ANSWER, stratified by the amount of change from baseline determined as the average difference between the VF being predicted and baseline VF. Semitransparent points are used to relieve overlapping. Positive values (above dashed lines) on the y-axis indicate a better performance by ANSWERS.
Table 1
 
Normalized Areas Under Partial Positive Rate Curves for ANSWERS, ANSWER, PoPLR, and Linear Regression of MD
Table 1
 
Normalized Areas Under Partial Positive Rate Curves for ANSWERS, ANSWER, PoPLR, and Linear Regression of MD
Table 2
 
Ratio of the Positive Rates (at 5% False-Positive Rate) for ANSWERS and ANSWER Over Those of ANSWER, PoPLR, and Linear Regression of MD
Table 2
 
Ratio of the Positive Rates (at 5% False-Positive Rate) for ANSWERS and ANSWER Over Those of ANSWER, PoPLR, and Linear Regression of MD
Table 3
 
Median (Interquartile Range) Mean Normalized Squared Error (MNSE) of ANSWERS, ANSWER, and PoPLR for Prediction of VFs at 10, 16, 20, and 24 Months From Baseline
Table 3
 
Median (Interquartile Range) Mean Normalized Squared Error (MNSE) of ANSWERS, ANSWER, and PoPLR for Prediction of VFs at 10, 16, 20, and 24 Months From Baseline
Table 4
 
Median (95% Confidence Interval) Improvement of ANSWERS Over PoPLR and ANSWER in Prediction at Various Amounts of Change in VF From Baseline
Table 4
 
Median (95% Confidence Interval) Improvement of ANSWERS Over PoPLR and ANSWER in Prediction at Various Amounts of Change in VF From Baseline
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×