October 2013
Volume 54, Issue 10
Free
Glaucoma  |   October 2013
Robust and Censored Modeling and Prediction of Progression in Glaucomatous Visual Fields
Author Affiliations & Notes
  • Susan R. Bryan
    Rotterdam Ophthalmic Institute, Rotterdam, The Netherlands
    Department of Biostatistics, Erasmus MC, Rotterdam, The Netherlands
  • Koenraad A. Vermeer
    Rotterdam Ophthalmic Institute, Rotterdam, The Netherlands
  • Paul H. C. Eilers
    Department of Biostatistics, Erasmus MC, Rotterdam, The Netherlands
  • Hans G. Lemij
    Glaucoma Service, Rotterdam Eye Hospital, Rotterdam, The Netherlands
  • Emmanuel M. E. H. Lesaffre
    Department of Biostatistics, Erasmus MC, Rotterdam, The Netherlands
    KU Leuven, L-Biostat, Leuven, Belgium
  • Correspondence: Susan R. Bryan, Schiedamse Vest 160d, 3011 BH Rotterdam, The Netherlands; s.bryan@eyehospital.nl
Investigative Ophthalmology & Visual Science October 2013, Vol.54, 6694-6700. doi:10.1167/iovs.12-11185
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Susan R. Bryan, Koenraad A. Vermeer, Paul H. C. Eilers, Hans G. Lemij, Emmanuel M. E. H. Lesaffre; Robust and Censored Modeling and Prediction of Progression in Glaucomatous Visual Fields. Invest. Ophthalmol. Vis. Sci. 2013;54(10):6694-6700. doi: 10.1167/iovs.12-11185.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose.: Classic regression is based on certain assumptions that conflict with visual field (VF) data. We investigate and evaluate different regression models and their assumptions in order to determine point-wise VF progression in glaucoma and to better predict future field loss for personalised clinical glaucoma management.

Methods.: Standard automated visual fields of 130 patients with primary glaucoma with a minimum of 6 years of follow-up were included. Sensitivity estimates at each VF location were regressed on time with classical linear and exponential regression models, as well as different variants of these models that take into account censoring and allow for robust fits. These models were compared for the best fit and for their predictive ability. The prediction was evaluated at six measurements (approximately 3 years) ahead using varying numbers of measurements.

Results.: For fitting the data, the classical uncensored linear regression model had the lowest root mean square error and 95th percentile of the absolute errors. These errors were reduced in all models when increasing the number of measurements used for the prediction of future measurements, with the classical uncensored linear regression model having the lowest values for these errors irrespective of how many measurements were included.

Conclusions.: All models performed similarly. Despite violation of its assumptions, the classical uncensored linear regression model appeared to provide the best fit for our data. In addition, this model appeared to perform the best when predicting future VFs. However, more advanced regression models exploring any temporal–spatial relationships of glaucomatous progression are needed to reduce prediction errors to clinically meaningful levels.

Introduction
According to the World Health Organization (WHO), glaucoma is one of the leading causes of blindness in the world. Adequate treatment may slow down the disease, possibly even halting its progression. 1 Evaluation of a longitudinal series of visual fields (VF), as measured by standard automated perimetry, provides a method to detect early evidence of glaucoma and to determine functional deterioration. 2 Hence, treatment strategies can then be optimized to prevent further progression of VF loss. Modeling the series of VFs is beneficial for this evaluation. It tries to extract the underlying VF progression from observed measurements and to distinguish it from noise, and can be used in the prediction of future VFs. 
Longitudinal modeling of VF summary parameters, such as the mean deviation (MD) and the visual field index (VFI), has been done before. 3 5 Modeling of individual VF points is potentially of greater interest, because it provides additional information such as the spatial nature of the fields, that is otherwise lost in global parameters. 6 Modeling the progression of glaucoma in individual test locations may be of significant interest to best determine patterns of progression. For example, more centrally located progression would arguably be of greater significance to the patient than more peripherally located progression. In addition, if progression took place in a particular location of the VF that was also affected in the fellow eye, this local progression would be more significant to the patient's quality of life than when the local progression took place elsewhere in the field, where no such overlapping defects existed. Determining and predicting the pattern and location of progression would therefore help clinicians to best tailor the management of glaucoma in their patients. 
Caprioli et al. 7 discussed measuring the rate of VF progression in glaucoma for each VF location independently. They explored regression analysis of the sensitivity estimates against time in three models: linear, quadratic, and nonlinear exponential. 7 There were some aspects of VF data that were not taken into account in these models. Firstly, the maximum luminance of the Humphrey Field Analyzer (Carl Zeiss Meditec, Dublin, CA) perimeter's stimuli is 10,000 apostilbs, which is defined as 0 dB retinal sensitivity. The lowest sensitivity that can be detected by this perimeter is therefore 0 dB, although negative values could in fact occur if it were not for the limitations of this device. 8 It is therefore of interest to use a model that takes censoring at 0 dB into account, as suggested by Russell and Crabb. 9 We think that this is especially true for locations showing more advanced disease progression, with high sensitivity estimates in the early follow-up period, but a relatively large number of 0 dB sensitivity estimates toward the end of the follow-up period. The Tobit model is used to estimate the relationship between variables when there is either left- or right-censoring (or above- and below-censoring) in the dependent variable. 10 Another consideration of regression analysis that deserves attention is that classical least-squares fitting is based on certain assumptions about the measurement error, which in fact may be violated. For example, measurement error in VFs increases with damage, and, hence, low sensitivity estimates have high variability. However, values lower than zero cannot be measured. This inherent censoring introduces a positive bias at low sensitivity estimates, which is made worse by the increased variability for low sensitivity estimates. 11 Another problem in regression analysis is that of outliers or points that do not follow the general trend. As in most practical problems, VF data may have several outliers. Single outliers may be easily identified in diagnostic tests, but it is generally more difficult to identify multiple outliers. 12 Therefore, regression analyses that are effective, even in the presence of multiple outliers, need to be considered, 13 such as robust methods. 14,15  
In past research, the main aim has been to compare different statistical models to determine the optimal fit for the VF data. However, the model with the best fit to the data and the best model for prediction may not be the same. More complicated models may fit the data better, but, in general, simpler models predict better since they are less likely to over fit the data. This was illustrated by McNaught et al., 16 who concluded that the best compromise between fitting and predicting sensitivity estimates is the linear model. Since glaucoma is a progressive disease and the purpose of modeling the progression is to be able to evaluate future field loss in a clinical setting, the predictive ability of these models is of importance. Our aim therefore was to introduce these censored and robust regression models for the point-specific evolutions of the VFs and to evaluate these models, as well as their assumptions, for modeling and predicting point-wise glaucomatous VF progression. 
Methods
The analysis was performed on the VF data of both eyes of 130 individuals from an ongoing study conducted at the Rotterdam Eye Hospital, The Netherlands. The inclusion criteria were glaucoma and a minimum of 15 VFs resulting in a minimum of 6 years of follow-up; the first 15 VFs were used in our analyses. This group consisted of 72 (55%) men and 58 (45%) women. All patients gave their written informed consent for participation. All research procedures followed the tenets set forth in the Declaration of Helsinki. The patients were followed up with approximately twice per year. The data set contained 3900 VFs, resulting in time series of 13,520 VF points. Descriptive statistics can be found in Table 1. All of the data that was used in this study has been made available in the public domain online at http://orgids.com
Table 1
 
Descriptive Statistics of the Study Sample
Table 1
 
Descriptive Statistics of the Study Sample
Mean Median Interquartile Range
Baseline age, y 59.8 61.4 53.3 to 67.6
Baseline MD, dB −8.3 −5.8 −12.8 to −2.3
Average change in MD, dB/y −0.07 −0.04 −0.2 to 0.1
Follow-up time, y 7.7 7.6 7.2 to 8.2
The VFs were tested by using the Humphrey Field Analyzer (Carl Zeiss Meditec) with the 24-2, white-on-white test strategy using the Full Threshold algorithm. The response variables of interest were the sensitivity estimates from the 52 VF points (excluding the 2 points that correspond to the blind spot). These locations were analyzed as independent samples. The distribution of the observed sensitivity estimates is shown in Figure 1. Note that the frequency of points with 0 dB sensitivity was relatively large, which emphasizes the need to specifically address these measurements. 
Figure 1
 
Distribution of point-wise sensitivity estimates in the study dataset.
Figure 1
 
Distribution of point-wise sensitivity estimates in the study dataset.
An example of the series of VFs for one eye of a patient can be seen in Figure 2, which shows the sensitivity estimates for each location in the VF over a period of 9 years. This example clearly shows little or no progressive damage with little measurement variability in the superior hemifield and markedly progressive changes with relatively large measurement variability in most points of the inferior hemifield. There is also significant measurement variability in the upper, most nasal point of the upper hemifield. 
Figure 2
 
Retinal sensitivity estimates over time for each location in the VF in the left eye of a single glaucoma patient.
Figure 2
 
Retinal sensitivity estimates over time for each location in the VF in the left eye of a single glaucoma patient.
Models
Regression models estimate the relationship between variables. The most straightforward approach is the linear regression model, which minimizes the sum of squared deviations from the regression line and passes through the mean in a straight line. A nonlinear alternative is the exponential model. The two corresponding statistical models are: 
1. First-order linear:Display Formula Image not available  
2. First-order exponential:Display Formula Image not available  
where i denotes in our current study an index that is unique for every VF location in each eye (Display FormulaImage not available ),Display FormulaImage not available is the measured sensitivity estimate expressed in dB,Display FormulaImage not available denotes the time the VF was acquired,Display FormulaImage not available andDisplay FormulaImage not available are the model parameters to be estimated andDisplay FormulaImage not available is the error term. Here, we assume that the error term is independently and identically normally distributed with a mean of zero and varianceDisplay FormulaImage not available , that is,Display FormulaImage not available .  
A censored (Tobit) model in our current analysis takes into account that unseen sensitivity estimates that are indicated on the VF print out as ‘< 0’ do not correspond to true zero values: they are smaller than zero but the instrument was unable to determine such sensitivities. The model, thus, tries to estimate the relationship between time and the latent, true sensitivity value. The relationship between the observedDisplay FormulaImage not available and the latent, true sensitivity valueDisplay FormulaImage not available is given by  Censoring was here taken into account for the linear model only. The response variable of the exponential model cannot go below 0 dB. Therefore, taking censoring at 0 dB into account in the model will not gain anything.  
Linear and exponential models for the median, so called median regression models were also included. Median regression provides a line that minimizes the sum of absolute values of the deviations from the regression line. Quantile regression, which includes median regression, provides a method that is less influenced by extreme values. Median regression models can estimate the true rate of change even in cases where a significant fraction of the measurements has large errors. Other robust methods might be considered such as Least Absolute Deviations Regression, 17 M-estimators, 18 and Robust Regression with Ranked Residuals, 14 however including all these models would be beyond the scope of this paper. 19  
We evaluated the following six models: the classical uncensored linear model, the classical exponential model, the classical censored linear model, the uncensored linear on the median model, the exponential on the median model, and the censored linear on the median model. 
Examples of some of the fits are shown in Figure 3. The points represent the sensitivity estimates from two locations belonging to the same eye from our data set, and the lines show the fits of the models based on this data. Figure 3A shows a case in which it is useful to take the censored nature of the data into account. The classical censored linear model takes into account that the measurements are censored at 0 dB, and, hence, shows the best model fit. This is only beneficial when there is progression, resulting in many 0 dB measurements. Figure 3B is an example that shows the need for more robust methods such as median regression. Here, it is clear that the uncensored linear on the median fit is less influenced by the outlying points, notably during the first 2 years of follow-up, than the other models. These two examples show that it is not a priori obvious that model best fits the observed point-wise VF data. 
Figure 3
 
Two real examples of sensitivity estimates over time at two single locations in the VF. The lines indicate corresponding fits of four models.
Figure 3
 
Two real examples of sensitivity estimates over time at two single locations in the VF. The lines indicate corresponding fits of four models.
Truncating the Predicted Response
Conventional models are designed to fit observed data, while censored models are designed to fit the true but unobserved data. To be able to compare predicted values from all models, the predicted values calculated to be less than 0 dB were set to 0 dB for the evaluation of both the fitting and the prediction of the data. Likewise, since the measured sensitivity estimates cannot realistically be above 40 dB, this limit was also set on the predicted values. 8  
Evaluation of Model Fit
We employed two classical approaches for the overall predictive ability of the models. The first aimed to measure how close the predicted values were to the observed responses in the current data (i.e., model fit). This may be done using the Akaike information criterion (AIC). Akaike information criterion is defined asDisplay FormulaImage not available , whereDisplay FormulaImage not available is the number of model parameters andDisplay FormulaImage not available is the likelihood function evaluated at the estimated parameter values.20 −2ln(L) measures the model fit to the current data. By adding the model complexity indicated by 2k, complex models are penalized. Generally, AIC is used to compare nonnested models with the same response, such as the classical uncensored linear and the classical exponential model. However, if the responses differ, one should compare AIC values with caution. In the case when comparing uncensored and censored models, it is not clear whether AIC can be used. An alternative to the AIC is leave-one-out cross-validation. We combined the errors (given by the predicted minus the observed sensitivity estimate) from each sensitivity estimate into summary error measures and compared the root mean square error (RMSE) and the mean absolute error (MAE) for each of the models. We compared both the RMSE and MAE since, by definition, the classical regression models minimize the RMSE, while the median regression models minimize the MAE. We evaluated only the classical uncensored linear and classical exponential models using AIC, ranking them by percentage of best fits in order to compare our results with previous work,7 but used leave-one-out cross-validation to calculate the RMSE and MAE for the comparison of all models.  
In clinical practice, one is primarily interested in VFs of individual eyes. Hence, interest lies in having a large fraction of patients within an estimated range, rather than looking at the average error over all eyes. Therefore, we also computed the 95th percentile of the absolute errors, which is the value below which 95% of the absolute prediction errors may be found, to compare all models. 
Since the RMSE and MAE were evaluated as summary measures, these statistics conceal how the models performed at the different levels of the sensitivities. Hence, to evaluate the models in greater detail, we also stratified the MAE over the range of sensitivity estimates to determine how the models perform at different sensitivity levels. In order to further investigate whether the assumptions that are made by the classical regression models hold, we evaluated the whole error distribution over the range of sensitivity estimates. 
Evaluation of Predictive Ability
The second approach aimed to measure how each model performed with respect to predicting future measurements. This involves using the first sequence of data to estimate the model parameters and calculate a future measurement value for that point in the VF. This was done by varying the number of measurements used for the estimation, each time predicting the sensitivity estimates six measurements (∼3 years) ahead. All the models were evaluated by using the RMSE, MAE, and 95th percentile of the absolute errors. As done for the comparison of the models' fitting abilities, the MAEs were stratified over the range of sensitivity estimates to compare the models predictive abilities at the different sensitivity levels. 
Results
Evaluation of Model Fit
When ranking the classical uncensored linear and the classical exponential models using AIC, the classical uncensored linear model performed best in 57% of the fits, while the classical exponential model had 43% of the best fits. The RMSE, MAE, and 95th percentile of the absolute errors for all models, calculated by cross-validation, are listed in Table 2. On our dataset, the classical censored linear model had the lowest MAE, although the MAE was almost the same over all the models. The classical uncensored linear model had the lowest RMSE and 95th percentile of the absolute error. All the models performed similarly. Overall, however, the classical uncensored linear model appeared to be the best model for our data. 
Table 2
 
Comparison of the Fitting Ability of the Models for the Predicted Values
Table 2
 
Comparison of the Fitting Ability of the Models for the Predicted Values
Model RMSE, dB MAE, dB 95th Percentile, dB
Classical uncensored linear 3.65 2.37 7.93
Classical exponential 3.77 2.39 8.02
Classical censored linear 3.72 2.33 8.09
Uncensored linear on the median 3.81 2.34 8.26
Exponential on the median 3.93 2.37 8.41
Censored linear on the median 3.83 2.34 8.31
Figure 4 shows the fitting ability of the models over different sensitivity levels. This shows a clear trend throughout the models, with low MAEs at 0 dB and in the range from 20 dB to 35 dB. This corresponds to the most frequently observed sensitivity estimates, as shown in Figure 1. The models perform the worst between 0 dB and 25 dB and above 35 dB, where there are very few measurements and, hence, the models are not being optimized very well at these high sensitivity levels. 
Figure 4
 
Mean absolute error using cross validation to evaluate the fitting ability of each model divided over the range of observed sensitivity estimates.
Figure 4
 
Mean absolute error using cross validation to evaluate the fitting ability of each model divided over the range of observed sensitivity estimates.
As a further step, we investigated the error distribution across the entire sensitivity range. Since the models all performed similarly, this was only done for the classical uncensored linear model, which had been found to be the best model. Curves of the 10th, 25th, 50th, 75th, and 90th percentiles of the error are plotted for this model in Figure 5, superimposed over a smoothed scatter plot showing the observed sensitivity estimates versus the corresponding errors. The median of the errors approximated to zero for sensitivities around 0 dB and 25 dB. From 30 dB down to 10 dB, the median remained close to zero, but the whole distribution became wider. However, from 10 dB down to 0 dB the magnitude of the negative errors decreased due to the truncation at the range, while it continued increasing for the positive errors. 
Figure 5
 
Smoothed scatter plot of the observed sensitivity estimates versus the corresponding error using cross validation including percentile curves (10th, 25th, 50th, 75th, and 90th) across the range of sensitivity levels for classical uncensored linear regression. The dark blue areas correspond to a high density of observed sensitivity estimates.
Figure 5
 
Smoothed scatter plot of the observed sensitivity estimates versus the corresponding error using cross validation including percentile curves (10th, 25th, 50th, 75th, and 90th) across the range of sensitivity levels for classical uncensored linear regression. The dark blue areas correspond to a high density of observed sensitivity estimates.
Evaluation of Predictive Ability
For 3-year predictions, RMSE, MAE, and 95th percentiles of the absolute errors were estimated for all models. The results are presented in Table 3. The classical uncensored linear model had the lowest RMSE, MAE, and 95th percentile of the absolute errors. Since the variations from the classical uncensored models all performed worse than classical uncensored models, we have included only the results for the classical uncensored linear and classical exponential regression models in Figure 6. Here the MAE and 95th percentiles of the absolute errors for 3-year predictions have been shown graphically for a varying numbers of time points. When including more measurements, the error was reduced; however, the ordering of the models did not change. The classical uncensored linear regression model performed the best when comparing all the models irrespective of how many time points were used. 
Figure 6
 
(A) Mean absolute and (B) 95th percentiles of 3-year prediction errors for the classical uncensored linear (black) and classical exponential (red) regression models.
Figure 6
 
(A) Mean absolute and (B) 95th percentiles of 3-year prediction errors for the classical uncensored linear (black) and classical exponential (red) regression models.
Table 3
 
Comparison of the Predictive Ability of the Models by Using Nine Measurements to Predict Three Years (6 Measurements) Ahead
Table 3
 
Comparison of the Predictive Ability of the Models by Using Nine Measurements to Predict Three Years (6 Measurements) Ahead
Model RMSE, dB MAE, dB 95th Percentile, dB
Classical uncensored linear 6.02 3.95 13.13
Classical exponential 7.29 4.39 15.00
Classical censored linear 6.34 4.05 14.00
Uncensored linear on the median 6.34 4.06 14.00
Exponential on the median 7.58 4.49 16.20
Censored linear on the median 6.51 4.12 14.22
Figure 7 shows the MAE over the range of sensitivity estimates using nine measurements to predict 3 years ahead. A similar trend could be seen throughout all the models as found when comparing the model fits, with the lowest MAEs occurring at 0 dB and between 25 dB and 35 dB. The difference between the models was however more pronounced between 0 dB and 20 dB when predicting than it was when fitting the data. When predicting, the exponential models were more likely to diverge and, hence, perform the worst within this range. 
Figure 7
 
Mean absolute error to evaluate the predictive ability of each model divided over the range of observed sensitivity estimates. *MAE ranges from 32.8 dB to 37.0 dB for the different models.
Figure 7
 
Mean absolute error to evaluate the predictive ability of each model divided over the range of observed sensitivity estimates. *MAE ranges from 32.8 dB to 37.0 dB for the different models.
Discussion
When ranking the classical uncensored linear and classical exponential models, the classical uncensored linear regression model was found to have the best fit most frequently (57% vs. 43%). These results differ from those obtained by Caprioli et al., 7 who concluded that the classical exponential model provided the best fit for their data in a very large majority of cases. This could be due to different implementation of the models. Caprioli et al. 7 specified a mathematical model and not a statistical model. In the latter model it is also specified how the error term enters the model (i.e., as an additive term to the exponential or behind the exponential sign). Without knowing the statistical model which they used, we are unable to determine whether our implementations were the same. 
When including all the models, the MAEs were similar, and the classical uncensored linear model again performed the best when comparing the RMSE and 95th percentile of the absolute error. The classical uncensored linear regression model was also found to be the best model for prediction irrespective of how many measurements were used to fit the model. Although the models performed similarly, we conclude that for both fitting purposes (i.e., for determining the rate of progression per test location and for prediction of future measurements) the classical uncensored linear regression model is recommended. However, even for this model the mean absolute errors were still on the order of 2 dB for fitting, and 4 dB for prediction, which is arguably too high to be of direct clinical use. 
A possible reason for the poor discrimination between the goodness of fit of the various fitting and predictive models is the low progression rates found in this study. This might be due to having a younger cohort than other studies, different inclusion and exclusion criteria, or the fairly large proportion of data points with 0 dB retinal sensitivity. 
The lowest error for fitting the data was found to be at 0 dB and around 30 dB for all models. This corresponds to the sensitivity estimates, which were most frequently seen in this dataset. In the same way, only a few measurements were seen between 0 dB and 25 dB and above 35 dB, where the models all performed the worst. This is also the case when comparing the predictive abilities of the models. It has also been shown that the test–retest variability is highest in this range and, hence, the values that we assume to be true also have a high error. 11 However, the difference between the models was more evident between 0 dB and 20 dB when comparing the models for best predictive ability than for best fitting ability. Hence, none of the models were being optimized well in this range, and the result of this problem is even more evident when predicting future measurements. As before, the real value of the sensitivity estimate is unknown. Therefore, the measurements being predicted also have error, so an accurate prediction of this measurement is not possible. 
The distribution of the residuals was not symmetric and differed over the range of sensitivity estimates, which is a violation of the assumption made by the classical regression models. The bias toward negative errors was, however, emphasized at the higher end of the range due to there being only a few measurements above 35 dB. The truncation of the negative errors at the lower end of the range was due to the device not being able to measure sensitivity estimates below zero, as well as the truncation of the predicted response, which we applied in our analysis. This could be an interesting aspect of the data to investigate further, by using a model, which assumes that the distribution of the residuals differ across the range of sensitivity estimates. 
In previous research, a quadratic model was often considered in addition to linear and exponential models. Compared with linear and exponential models, these studies did not identify the quadratic model as an improvement. 7,16 In addition, the model is not monotonic and there are no theoretical reasons (e.g., from anatomy or functional models) for using such a quadratic model. We also evaluated the quadratic model but found no indication of its suitability in our application. We therefore argue that this model can safely be excluded in future research on modeling time series of threshold sensitivities. 
In this study, censoring was done at 0 dB as suggested by Russell and Crabb. 9 However, recent research found that ganglion saturation can also account for high test–retest variability between 0 dB and 25 dB and, hence, 0 dB is outside the dynamic range of a ganglion cell, 21 and that sensitivities below 10 dB are mainly noise. 22 For this reason, we also considered censoring and truncating of the predicted response at 10 dB. When censoring and truncation were done at 10 dB rather than 0 dB, the MAE was higher for all the models; however, the classical uncensored linear model was still found to be the best model for fitting and for prediction. 
For future work, certain aspects of the data need to be further investigated. One of the difficulties in modeling this data is the amount and type of measurement error and variability in the sensitivity estimates. This may be due to blinking, eye movements, flagging attention by the patient, fatigue, inexperience, a delayed reaction time, and the patient forgetting to press the button. Factors such as test reliability, technical experience, time of day, and season have been shown to influence standard automated perimetry test results. 23 In order to measure the true progression of the disease, the high, sensitivity-dependent, variability would need to be taken into account by incorporating it into the statistical model, or new, more reproducible methods for determining the VFs' need to be developed. The device should flag measurements if they fall outside a clinically acceptable range determined from previous observations. This would indicate that either the measurement is wrong and the test should be redone, or that there is a real change in the rate of progression and the treatment strategy may need to be adjusted. 
In addition, one could exploit the spatial nature of the data and capitalize on the specific spatial organization of the nerve fibers in the eye. 24,25 A problem with using regression models is that each location is treated as an independent sample and we are not able to link locations belonging to the same eye together. This problem can be addressed by using spatial models or mixed effects models, 26 which we propose for future work. 
Finally, sensitivity estimates at different test locations in the same VF do not all progress at the same speed, and directly modeling summary parameters will not be able to handle this well. Hence, it may also be interesting to compare the summary predictions by modeling the MD directly and by predicting each point and consequently calculating the MD. This would elucidate whether modeling the individual points is more informative than modeling global indices. 6  
In conclusion, we have shown that the classical uncensored linear model was found to be the best model for fitting VF data, as well as for predicting future VF measurements. However, despite the various theoretical pros and cons of each of the explored models, none of them performed satisfactorily well, neither for fitting, nor for predicting future damage, which is why we need to further explore models that take spatiotemporal relationships more into consideration. Modeling point-wise threshold sensitivity over time is relevant, but more advanced models are needed to further reduce prediction errors and produce clinically meaningful results. 
Acknowledgments
Disclosure: S.R. Bryan, None; K.A. Vermeer, None; P.H.C. Eilers, None; H.G. Lemij, Carl Zeiss Meditec (C); E.M.E.H. Lesaffre, None 
References
Kingman S. Glaucoma is the second leading cause of blindness globally. Bull World Health Organ . 2004; 82: 887– 888. [PubMed]
Flammer J Meier E. Glaucoma: A Guide For Patients: An Introduction for Care-Providers: A Quick Reference. 2nd ed. Cambridge, MA: Hogrefe & Huber Publishers; 2003.
Bengtsson B Heijl A. A visual field index for calculation of glaucoma rate of progression. Am J Ophthalmol . 2008; 145: 343– 353. [CrossRef] [PubMed]
Kymes SM Lambert DL Lee PP The development of a decision analytic model of changes in mean deviation in people with glaucoma: the COA model. Ophthalmology . 2012; 119: 1367– 1374. [CrossRef] [PubMed]
Artes PH O'Leary N Hutchison DM Properties of the statpac visual field index. Invest Ophthalmol Vis Sci . 2011; 52: 4030– 4038. [CrossRef] [PubMed]
Azarbod P Mock D Bitrian E Validation of pointwise exponential regression to measure the decay rates of glaucomatous visual fields. Invest Ophthalmol Vis Sci . 2012; 53: 5403– 5409. [CrossRef] [PubMed]
Caprioli J Mock D Bitrian E A method to measure and predict rates of regional visual field decay in glaucoma. Invest Ophthalmol Vis Sci . 2011; 52: 4765– 4773. [CrossRef] [PubMed]
Anderson DR Patella VM. Automated static perimetry. 2nd ed. St. Louis, MO: C V Mosby; 1999.
Russell RA, Crabb DP. Response to Author. Invest Ophthalmol Vis Sci . 2011; 52: 9539– 9540. [CrossRef] [PubMed]
Tobin J. Estimation of relationships for limited dependent variables. Econometrica . 1958; 26: 24– 36. [CrossRef]
Russell RA Crabb DP Malik R Garway-Heath DF. The relationship between variability and sensitivity in large-scale longitudinal visual field data. Invest Ophthalmol Vis Sci . 2012; 53: 5985– 5990. [CrossRef] [PubMed]
Hawkings DM. Identification of Outliers . London: Chapman and Hall; 1980.
Rousseeuw PJ Leroy AM. Robust Regression and Outlier Detection . New York: Wiley; 1987.
Draper NR Smith H. Applied Regression Analysis. 3rd ed. New York: Wiley; 1998 .
Koenker R. Quantile Regression . New York: Cambridge University Press; 2005.
McNaught AI Crabb DP Fitzke FW Hitchings RA. Modelling series of visual fields to detect progression in normal-tension glaucoma. Graefes Arch Clin Exp Ophthalmol . 1995; 233: 750– 755. [CrossRef] [PubMed]
Edgeworth FY. On observations relating to several quantities. Hermathena . 1887; 6: 279– 285.
Huber PJ. Robust Statistics. New York: John Wiley and Sons; 1981.
Hoaglin DC Mosteller F Tukey JW. Understanding Robust and Exploratory Data Analysis . New York: Wiley; 1983.
Akaike H. Likelihood of a model and information criteria. J Econom . 1981; 16: 3– 14. [CrossRef]
Swanson WH Sun H Lee BB Cao D. Responses of primate retinal ganglion cells to perimetric stimuli. Invest Ophthalmol Vis Sci . 2011; 52: 764– 771. [CrossRef] [PubMed]
Junoy Montolio FG, Wesselink C, Jansonius NM. Persistence, spatial distribution and implications for progression detection of blind parts of the visual field in glaucoma: a clinical cohort study. PLoS One . 2012; 7: e41211. [CrossRef] [PubMed]
Junoy Montolio FG, Wesselink C, Gordijn M, Jansonius NM. Factors that influence standard automated perimetry test results in glaucoma: test reliability, technician experience, time of day, and season. Invest Ophthalmol Vis Sci . 2012; 53: 7010– 7017. [CrossRef] [PubMed]
Denniss J McKendrick AM Turpin A. An anatomically-customisable computational model relating the visual field to the optic nerve head in individual eyes. Invest Ophthalmol Vis Sci . 2012; 53: 6981– 6990. [CrossRef] [PubMed]
Airaksinen PJ Doro S Veijola J. Conformal geometry of the retinal nerve fiber layer. Proc Natl Acad Sci . 2008; 105: 19690– 19695. [CrossRef] [PubMed]
Verbeke G Molenberghs G. Linear Mixed Models for Longitudinal Data . New York: Springer; 2000.
Figure 1
 
Distribution of point-wise sensitivity estimates in the study dataset.
Figure 1
 
Distribution of point-wise sensitivity estimates in the study dataset.
Figure 2
 
Retinal sensitivity estimates over time for each location in the VF in the left eye of a single glaucoma patient.
Figure 2
 
Retinal sensitivity estimates over time for each location in the VF in the left eye of a single glaucoma patient.
Figure 3
 
Two real examples of sensitivity estimates over time at two single locations in the VF. The lines indicate corresponding fits of four models.
Figure 3
 
Two real examples of sensitivity estimates over time at two single locations in the VF. The lines indicate corresponding fits of four models.
Figure 4
 
Mean absolute error using cross validation to evaluate the fitting ability of each model divided over the range of observed sensitivity estimates.
Figure 4
 
Mean absolute error using cross validation to evaluate the fitting ability of each model divided over the range of observed sensitivity estimates.
Figure 5
 
Smoothed scatter plot of the observed sensitivity estimates versus the corresponding error using cross validation including percentile curves (10th, 25th, 50th, 75th, and 90th) across the range of sensitivity levels for classical uncensored linear regression. The dark blue areas correspond to a high density of observed sensitivity estimates.
Figure 5
 
Smoothed scatter plot of the observed sensitivity estimates versus the corresponding error using cross validation including percentile curves (10th, 25th, 50th, 75th, and 90th) across the range of sensitivity levels for classical uncensored linear regression. The dark blue areas correspond to a high density of observed sensitivity estimates.
Figure 6
 
(A) Mean absolute and (B) 95th percentiles of 3-year prediction errors for the classical uncensored linear (black) and classical exponential (red) regression models.
Figure 6
 
(A) Mean absolute and (B) 95th percentiles of 3-year prediction errors for the classical uncensored linear (black) and classical exponential (red) regression models.
Figure 7
 
Mean absolute error to evaluate the predictive ability of each model divided over the range of observed sensitivity estimates. *MAE ranges from 32.8 dB to 37.0 dB for the different models.
Figure 7
 
Mean absolute error to evaluate the predictive ability of each model divided over the range of observed sensitivity estimates. *MAE ranges from 32.8 dB to 37.0 dB for the different models.
Table 1
 
Descriptive Statistics of the Study Sample
Table 1
 
Descriptive Statistics of the Study Sample
Mean Median Interquartile Range
Baseline age, y 59.8 61.4 53.3 to 67.6
Baseline MD, dB −8.3 −5.8 −12.8 to −2.3
Average change in MD, dB/y −0.07 −0.04 −0.2 to 0.1
Follow-up time, y 7.7 7.6 7.2 to 8.2
Table 2
 
Comparison of the Fitting Ability of the Models for the Predicted Values
Table 2
 
Comparison of the Fitting Ability of the Models for the Predicted Values
Model RMSE, dB MAE, dB 95th Percentile, dB
Classical uncensored linear 3.65 2.37 7.93
Classical exponential 3.77 2.39 8.02
Classical censored linear 3.72 2.33 8.09
Uncensored linear on the median 3.81 2.34 8.26
Exponential on the median 3.93 2.37 8.41
Censored linear on the median 3.83 2.34 8.31
Table 3
 
Comparison of the Predictive Ability of the Models by Using Nine Measurements to Predict Three Years (6 Measurements) Ahead
Table 3
 
Comparison of the Predictive Ability of the Models by Using Nine Measurements to Predict Three Years (6 Measurements) Ahead
Model RMSE, dB MAE, dB 95th Percentile, dB
Classical uncensored linear 6.02 3.95 13.13
Classical exponential 7.29 4.39 15.00
Classical censored linear 6.34 4.05 14.00
Uncensored linear on the median 6.34 4.06 14.00
Exponential on the median 7.58 4.49 16.20
Censored linear on the median 6.51 4.12 14.22
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×