**Purpose.**:
Evaluation of progressive visual field (VF) damage is often based on pointwise sensitivity data from standard automated perimetry; however, frequency-of seeing and test-retest studies demonstrate that these measurements can be highly variable, especially in areas of damage. The aim of this study was to characterize VF variability by the level of sensitivity using a statistical method to quantify heteroscedasticity.

**Methods.**:
A total of 14,887 Humphrey 24-2 SITA Standard VFs from 2736 patients (2736 eyes) attending Moorfields Eye Hospital from 1997 to 2009 were studied retrospectively. The VF series of each eye was analyzed using pointwise linear regression of sensitivity over time, with residuals (difference from fitted-value) from each regression pooled according to both observed and fitted sensitivities.

**Results.**:
The median (interquartile range) patient age, follow-up, and series length was 64 (54–71) years, 5.5 (3.9–7.0) years, and 6 (5–7) VFs, respectively. The inferred variability as a function of fitted-sensitivity was in good agreement with previous estimates. Variability was also described as a function of measured sensitivity, which confirmed that variability increased rapidly as the observed sensitivity decreased.

**Conclusions.**:
This study highlights a new approach for characterizing VF variability by the level of sensitivity. A considerable strength of the method is that inference is based on thousands of clinic patients rather than the tens of subjects in test-retest studies. The results can help distinguish real VF progression from measurement variability and will be used in models for glaucoma progression detection.

^{ 1–3 }and test-retest studies.

^{ 4–11 }In glaucoma management, VF variability represents an enormous hindrance for clinicians interpreting VF test results and determining progression.

^{ 2 }collected FOS data, using SAP, from four VF locations in 71 subjects with a range of VF damage. The authors concluded that response variability (SD) increased with decreasing sensitivity, and summarized the relationship using the function: $ l o g e ( S D ) = A \xb7 s e n s i t i v i t y ( d B ) + B , $ where

*A*= −0.081 and $ B = 3.27 $; $ R 2 = 0.57. $ According to this function, as sensitivity decreases there is an exponential increase in response variability. However, the inferences from this study are somewhat limited in their relevance to clinical SAP test strategies, where measured-thresholds are not based on FOS curves. Furthermore, there was a paucity of measurements with low sensitivity, and no data below 10 dB. Artes et al.

^{ 9 }overcame these limitations using test-retest data, and studied the variability properties of SAP threshold estimates from the Full Threshold and Swedish Interactive Threshold Algorithms (SITA) strategies of the Humphrey Field Analyzer (Carl Zeiss Meditec, Dublin, CA). The authors examined one eye each of 49 glaucoma patients, with a large range of VF damage, four times with each test strategy. Variability was shown to increase as the “best available estimate of sensitivity” (mean sensitivity from test-retesting) decreased up until approximately 10 dB, after which variability reduced as sensitivity declined further. Remarkably, for glaucomatous locations with low differential light sensitivity, such as 10-dB sensitivity, the authors showed that variability was so large that the 5th and 95th percentiles of the retest values spanned 0 dB to 28 dB, respectively.

^{ 12 }Each VF series examined had to be at least five long for inclusion into the present study, and the first VF was then discarded to reduce perimetric learning effects.

^{ 13,14 }

^{ 15 }The error term ( $E$) represents the part of the response variable ( $Y$) that is not explained by the predictor variable ( $X$). For the fitted function, the error term is estimated from the residuals, which are the vertical deviations from the fitted line. The simplest and most common method for fitting a regression line is to minimize the sum of squares of these residuals; this approach is known as ordinary least squares linear regression (OLSLR). The method assumes that the predictor variable is error-free and that the error-term is normally distributed with mean zero and constant variance across the range of the measurement. Constant variance is known as homoscedasticity (“equal scatter”); however, if the residuals are dependent on the on the magnitude of the measurement, this is referred to as heteroscedasticity (“unequal scatter”), which indicates that the variance of the error term is not uniform across observations. Importantly, heteroscedasticity does not cause OLSLR coefficient estimates (i.e., the slope and intercept terms) to be biased; however, estimates of the variance of the coefficients are biased.

^{ 16 }Thus, OLSLR in the presence of heteroscedasticity provides an unbiased estimate for the relationship between the predictor variable and the response variable, but standard errors and consequently inferences on statistical significance are affected. The residuals from linear regression are informative when investigating heteroscedasticity. A scatterplot of the squared or absolute residuals against $ X , Y $ or

*Ŷ*(the fitted-value of $Y$) may be used to assess for nonconstant variance of the error term.

^{ 17 }We have previously discussed the usefulness of TLR for the analysis of VF sensitivity measurements.

^{ 18 }If the outcome variable in a linear model is censored (as is the case for threshold sensitivity), the assumptions of the OLSLR model are not valid; however, TLR provides a valid alternative. TLR uses a latent dependent variable, which respects left- and/or right-censoring and predicts the response only within the range specified. Thus, in contrast to OLSLR, TLR reconciles that a VF threshold of 0 dB may be 0 dB, or, a value less than 0 dB (were the perimeter to have a greater dynamic range).

*Ê*) were extracted: and grouped (binned) according to $Y$ (measured-sensitivity) in the range [0 to 36] dB. Residuals were also binned according to the fitted-sensitivity value (rounded to the nearest whole decibel),

*Ŷ*, as this value estimates “true” sensitivity. These two binning methods ask subtly different questions, which are now outlined. Binning by measured-sensitivity establishes variability conditional on the measured-threshold, and asks the question,

*given a measured-threshold what is the underlying range of values for the “true” value?*This approach is akin to the method used by Wall et al.,

^{ 4 }where retest thresholds were compared with baseline threshold; in this case, as the authors state, “the first test is not meant to be the true sensitivity, as it has its own variability.” On the other hand, binning by fitted-sensitivity investigates variability according to the estimated true value, and asks the question,

*given a “true” threshold value what is the range of measured-thresholds expected for any given test?*This approach mirrors the one used in Artes et al.,

^{ 9 }where thresholds were compared with the mean of several retest thresholds. The difference in the two binning strategies is shown in Figure 1. In this figure, residuals (indicated by the dashed lines) are associated with an OLSLR fitted-sensitivity bin of 31 dB; however, if the residuals are stratified by measured-threshold, they are pooled into the following bins: 30, 31, 31, 30, 33, 31, 33, and 28 dB. All statistical analyses were carried out in the open-source programming language, R.

^{ 19 }

**Figure 1.**

**Figure 1.**

**Table.**

**Table.**

Measurement | Median (Interquartile Range) |

Baseline age | 63.7 (54.0–71.2) y |

Baseline pointwise sensitivity | 28 (24–30) dB |

Follow-up period | 5.5 (3.9–7.0) y |

Number of VF tests | 6 (5–7) |

^{ 9 }(red points) and Henson et al.

^{ 2 }(yellow points).

**Figure 2.**

**Figure 2.**

**Figure 3.**

**Figure 3.**

^{ 1,4,6–9,20,21 }For example, 25 years ago, Heijl et al.

^{ 21 }analyzed test-retest VF data and concluded that, in areas of moderate to advanced glaucomatous damage (8 to 18 dB loss), the 95% prediction interval associated with the VF sensitivity measurement spanned the entire dynamic range of the instrument. Very similar results have been shown for the SITA standard testing algorithm.

^{ 4,9 }The association between a decline in VF sensitivity and an increase in response variability could be caused by a loss of ganglion cells (due to glaucomatous damage), or relocation of the stimulus to the peripheral visual field where there are fewer ganglion cells.

^{ 22 }Previous studies have suggested that a reduction in the number of stimulated ganglion cells, by a reduction in stimulus size

^{ 23,24 }or pathological damage,

^{ 2,9 }may lead to an increase in response variability. Therefore, these results highlight the considerable difficulties in reliably identifying VF damage in areas of moderate to advanced glaucomatous VF loss.

^{ 25,26 }Moreover, standard pointwise linear regression remains the most popular method to investigate rates of VF progression.

^{ 27–30 }A limitation of our study is that the method estimates two parameters: a slope and an intercept term, whereas test-retest studies estimate the latter parameter only, as they assume that the measurement does not change in the period of time over which the data are captured. Accordingly, a limitation of test-retest data is that it assumes the absence of a perimetric learning effect, which can persist over tens of VF tests in some patients.

^{ 13,14,31 }Nevertheless, our findings agree well with those in Artes et al.,

^{ 9 }but are possibly more robust because of the large amount of data analyzed. Furthermore, our sample consists of patients from general ophthalmic practice and not “perimetry athletes” who are very familiar with SAP, and consequently may record less noisy measurements. Because our analysis was carried out on retrospective data, the method could be easily applied to other perimetry and imaging devices used in glaucoma assessment, saving time and money compared with prospective test-retest or FOS studies.

^{ 9 }and Wall et al.

^{ 4 }: when VF thresholds are equal to 0 dB, the arithmetic mean is not robust to the truncated nature of VF measurements, and the derived variability (expressed as the SD of retest thresholds) is consequently reduced (see Fig. 3A). This truncation effect is very evident in the distributions shown in Figure 2 and in similar figures in other studies.

^{ 4,9 }Our results emphasize that a reduction in VF variability for thresholds below 10 dB can largely be attributed to this truncation.

^{ 4 }showed that threshold estimates below 20 dB have little value for predicting the value at retest; however, their results were based on only one baseline measurement and a single retest value. Our results are similar to the findings of Wall et al.

^{ 4 }and support their suggestion that for the purposes of detecting VF progression, examination of damaged test locations could be worthless.

^{ 16,32 }Furthermore, recent research from Caprioli et al.

^{ 33 }suggest that pointwise exponential regression (PER) provides a robust estimate of rates of VF decay that may predict future global indices more accurately than standard linear regression. An interesting extension to our research would be to use the same methodology to investigate the relationship between variability and sensitivity level with the PER model; the results from such an analysis would indicate if the PER model overcomes the heteroscedasticity and non-normal distribution of residuals that are seen with OLSLR and TLR.

^{ 30,34–36 }have been based on Henson et al.'s

^{ 2 }equation for VF variability, which may be less appropriate than the results shown here, as results from Henson et al.

^{ 2 }were based on FOS curves with no thresholds below 10 dB. Furthermore, our results are based on thousands of patients in general care and not research patients with a superior knowledge of perimetry testing. Simulating VFs using our results would also allow progression criteria to be established with known sensitivity and specificity characteristics, which would be important for application in clinical trials. Finally, as stated earlier, because our study was carried out on retrospective data, the analysis could be easily used in other visual field and imaging devices, saving time and money compared with prospective test-retest or FOS studies.

*. 1993;34:3534–3540. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2000;41:417–421. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2001;42:1404–1410. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2009;50:974–979. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 1990;109:109–111. [CrossRef] [PubMed]*

*Am J Ophthalmol**. 1984;102:704–706. [CrossRef] [PubMed]*

*Arch Ophthalmol**. 1989;108:130–135. [CrossRef] [PubMed]*

*Am J Ophthalmol**. 1998;116:53–61. [CrossRef] [PubMed]*

*Arch Ophthalmol**. 2002;43:2654–2659. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2005;24:333–354. [CrossRef] [PubMed]*

*Prog Retin Eye Res**. 2005;46:2451–2457. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2000;41:2201–2204. [PubMed]*

*Invest Ophthalmol Vis Sci**. 1996;114:19–22. [CrossRef] [PubMed]*

*Arch Ophthalmol**. 1989;107:81–86. [CrossRef] [PubMed]*

*Arch Ophthalmol**. Oxford: Oxford University Press; 2000.*

*An Introduction to Medical Statistics**. 1980;48:817–838. [CrossRef]*

*Econometrica**. 1958;26:24–36. [CrossRef]*

*Econometrica**. 52:9539–9540. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. Vienna, Austria: R Foundation for Statistical Computing; 2010.*

*R: A Language and Environment for Statistical Computing**. 1979;210:235–250. [CrossRef] [PubMed]*

*Albrecht Von Graefes Arch Klin Exp Ophthalmol**. 1987;105:1544–1549. [CrossRef] [PubMed]*

*Arch Ophthalmol**. 1992;1:79–85. [PubMed]*

*Ger J Ophthalmol**. 1997;38:426–435. [PubMed]*

*Invest Ophthalmol Vis Sci**. 1999;40:648–656. [PubMed]*

*Invest Ophthalmol Vis Sci**. 1995;233:750–755. [CrossRef] [PubMed]*

*Graefes Arch Clin Exp Ophthalmol**. 2009;127:1610–1615. [CrossRef] [PubMed]*

*Arch Ophthalmol**. 2004;111:1627–1635. [CrossRef] [PubMed]*

*Ophthalmology**. 2010;51:1458–1463. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2006;47:2896–2903. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2002;43:1400–1407. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2008;85:1043–1048. [CrossRef] [PubMed]*

*Optom Vis Sci**. 1967;1:221–233.*

*Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability**. 2011;52:4765–4773. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2002;86:560–564. [CrossRef] [PubMed]*

*Br J Ophthalmol**. 2003;44:4787–4795. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2007;48:1627–1634. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci*