**Purpose.**:
Variability in perimetry increases with the amount of damage, making it difficult for testing algorithms to efficiently converge to the true sensitivity. This study describes a variability-adjusted algorithm (VAA), in which step size increases with variability.

**Methods.**:
Contrasts were transformed to a new scale wherein the SD of frequency-of-seeing curves remains 1 unit for any sensitivity. A Bayesian thresholding procedure based on the existing Zippy Estimation by Sequential Testing (ZEST) algorithm was simulated on this new scale, and results converted back to decibels. The root-mean-squared (RMS) error from true sensitivity based on these simulations was compared against that achieved by ZEST using the same number of presentations. The procedure was repeated after limiting sensitivities to 15 dB or higher, the lower limit of reliable sensitivities using standard white-on-white perimetry in glaucoma, for both algorithms.

**Results.**:
When the true sensitivity was 35 dB, with starting estimate also 35 dB, RMS errors of the algorithms were similar, ranging from 1.39 dB to 1.60 dB. When true sensitivity was instead 20 dB, with starting estimate 35 dB, VAA reduced the RMS error from 7.43 dB to 3.66 dB. Limiting sensitivities at 15 dB or higher reduced RMS errors, except when true sensitivity was near 15 dB.

**Conclusions.**:
VAA reduces perimetric variability without increasing test duration in cases in which the starting estimate of sensitivity is too high; for example, due to a small scotoma. Limiting the range of possible sensitivities at 15 dB or higher made algorithms more efficient, unless the true sensitivity was near this limit. This framework provides a new family of test algorithms that may benefit patients.

^{ 1 }At more damaged locations, this variability worsens.

^{ 2,3 }For example, when sensitivity is 20 dB, the 90% confidence interval for perimetric sensitivity (i.e., the 5th to the 95th percentile) is approximately 12 dB wide.

^{ 1 }This makes clinical detection of true functional damage challenging, and necessitates a series of several visual fields to confidently assess the rate of visual field change.

^{ 4 }Existing perimetric testing algorithms, such as the Swedish Interactive Testing Algorithm (SITA),

^{ 5 }German Adaptive Thresholding Estimation,

^{ 6 }and Zippy Estimation by Sequential Testing (ZEST),

^{ 7,8 }aim to minimize this variability subject to various constraints, but they are all limited by the need to maintain a short test duration, so that the reliability of subject responses is not compromised by fatigue.

^{ 9,10 }

^{ 2 }For example, if the true sensitivity is 35 dB, the probability of responding to a 39-dB stimulus has been reported as being 5%, whereas if the true sensitivity is 20 dB, the response probability for a 24-dB stimulus is 25%.

^{ 2 }If the subject responds to such a stimulus, testing algorithms assume that the sensitivity is probably greater than the contrast of that stimulus. The algorithm will therefore have difficulty converging to the correct sensitivity without requiring an exorbitantly large number of stimulus presentations. This is problematic because in clinical perimetry it is desirable to present only three or four stimuli per location to test the entire central visual field in close to 5 minutes. The “flattening” of the FOS curve that occurs in regions of glaucomatous damage therefore increases the variability about estimates of perimetric sensitivities.

^{ 11 }This means that the response probability asymptotes at some fixed contrast, and further increasing the stimulus contrast will not increase the response probability, hampering the ability of test algorithms to converge to the true sensitivity. We have recently shown that this causes perimetric sensitivities within the central visual field to be unreliable below 15 to 19 dB, with little relation to the true sensitivity as measured using FOS curves.

^{ 12 }Testing algorithms can therefore be shortened by stopping testing once this contrast has been reached, rather than continuing testing with stimuli of 10 dB or 5 dB, for example, which might not provide further useful information about the true sensitivity. In this study, we assess the potential benefits on variability of using test algorithms that are designed to terminate once the sensitivity has been determined to be below 15 dB, allowing more accurate assessments to be made at locations with higher sensitivities within the same average test duration.

*Θ*(

*X*) represents a cumulative Gaussian function, such that

*Θ*(

*−∞*)

*=*0,

*Θ*(0) = 0.5, and

*Θ*(

*∞*) = 1.

*FP*and

*FN*represent the probability of a false-positive response (responding to a stimulus that was not detected) and a false-negative response (failing to respond to a stimulus that was detected), respectively. The primary results assumed

*FP*= 5% false-positive responses and

*FN*= 5% false-negative responses. The variability is defined by the SD of the FOS curve

*SD*, which is taken from the formula of Henson et al.

^{ 2 }:

*SD*= exp(−0.081 ×

*Sens*+ 3.27).

*Ψ*(

*Stim*,

*Sens*), the simulation continues as if there was a response to the stimulus. If the random number generated is less than or equal to

*Ψ*(

*Stim*,

*Sens*), the simulation continues as if there was no response. All simulations were performed using the statistical programming language R.

^{ 13 }

^{ 5 }not all of the details of its implementation are publically available. Therefore, the current simulation is built on the ZEST algorithm,

^{ 7 }which has similar variability for a given sensitivity,

^{ 14 }and may be less variable when a reasonably accurate initial guess of the true sensitivity is made (e.g., based on information from other locations in the visual field).

^{ 14 }ZEST is a Bayesian thresholding algorithm and requires an initial prior probability density function (pdf). In this simulation, the prior pdf was set as a uniform distribution spanning the range (−3 dB, 40 dB). The lower end of this range is below the nominal maximum contrast stimulus (0 dB) to aid in algorithmic convergence if the true sensitivity happens to be at or near 0 dB. At each step, a stimulus presentation is simulated as above, at the mean of the current distribution (i.e., the first stimulus is at 18.5 dB, which is the mean of the prior pdf). The posterior probability that the sensitivity is

*ŝ*is then calculated by multiplying the prior probability that the sensitivity is

*ŝ*by either

*Ψ*(

*Stim*,

*ŝ*) if the simulated observer “responded” to the stimulus, or 1 −

*Ψ*(

*Stim*,

*ŝ*) if the simulated observer failed to “respond” to the stimulus. The mean of the resulting posterior distribution is used as the new estimate of sensitivity at that location, and also gives the contrast of the next stimulus to be displayed in the sequence.

*Φ*([

*S − Guess*]/5)/

*Φ*(0), where

*Φ*(

*x*) represents the pdf of a Gaussian distribution with mean 0 and SD 1; and then normalized such that the integral of the prior pdf over the available range (−3 dB, 40 dB) equaled one. Therefore the weighted prior pdf had a constant, nonzero probability of any sensitivity within the range, plus an increased probability around the initial guess according to a Gaussian distribution with an SD of 5 dB. Other values of this SD were tried in initial simulations (results not shown); 5 dB represented a compromise between being small enough to increase speed and accuracy when the initial guess was close to the true sensitivity, while being large enough that the deleterious effect of an inaccurate initial guess was acceptably small. Note that due to the baseline nonzero probability for any sensitivity within the range, the mean of this weighted prior (which will be the first stimulus contrast presented) will typically not equal the initial guess, but will be between this initial guess and the midpoint of the range. This reduces the effect that an incorrect initial guess will have on the final estimated sensitivity.

*P*(

*s*) × (

*s − Mean*)

^{2}, where

*P*(

*s*) represents the posterior probability that the true sensitivity is

*s*, and

*Mean*represents the mean of the posterior pdf. The SD of the posterior pdf was calculated as the square root of this value. The algorithm was said to terminate once the SD of the posterior pdf was less than a criterion value

*SD*, and the estimate of sensitivity was given by the mean of the posterior pdf at that point. After 1000 simulations for each possible value of

_{Crit}*SD*, the mean number of stimuli that had been presented in the sequence to achieve that criterion and the remaining root-mean-squared error (RMS error; the true sensitivity minus the estimate from the posterior pdf) were calculated. The RMS error was plotted against the mean number of presentations taken to reach this criterion, to show the variability that should be expected for sequences with a given number of stimulus presentations.

_{Crit}^{ 2 }that this increase can be described by the equation

*SD*(

*Sens*) = exp(−0.081 ×

*Sens*+ 3.27). The minimum detectable difference, the smallest difference between sensitivities that can confidently be ascribed to pathophysiologic difference rather than variability, is proportional to this SD. The variability-adjusted algorithm seeks to ensure that the step size between consecutive stimulus contrasts increases with this minimum detectable difference. This should ensure that the probability that the subject will respond changes substantially between consecutive stimuli.

*SD*(

*Sens*) on the new scale. This means that a 1-unit difference on the VAS scale is always equivalent to a change on the decibel scale that equals the SD of a FOS curve with that sensitivity. Hence 40 dB ↔ 40 VAS (by definition); and below this point:

**Figure 1**

**Figure 1**

^{ 12 }Therefore, there is little benefit to continuing testing with stimulus contrasts stronger than 15 dB, because the response probability will be the same as if the 15-dB stimuli were presented again. Responses to stimuli lower than 15 dB may be helpful in determining the likelihood that the true sensitivity is lower than 15 dB, but the same information could be gained by presenting a 15-dB stimuli again, which also would be less likely to have caveats due to light scatter. Algorithms can therefore be made more efficient if they stop testing at 15 dB, so that the time saved can then be spent on deriving more accurate measures of sensitivity at this and other locations in the visual field.

**Figure 2**

**Figure 2**

**Figure 3**

**Figure 3**

**Figure 4**

**Figure 4**

**Figure 5**

**Figure 5**

**Table**

**Table**

True Sensitivity, dB | Initial Guess, dB | RMS Error Using ZEST | RMS Error Using ZEST Limited at ≥15 dB | RMS Error Using VAA | RMS Error Using VAA Limited at ≥15 dB |

35 | 35 | 1.53 | 1.39 | 1.39 | 1.60 |

35 | 30 | 1.47 | 1.52 | 1.57 | 1.66 |

35 | 25 | 2.87 | 2.92 | 1.99 | 1.66 |

35 | 20 | 4.84 | 4.60 | 2.23 | 1.94 |

20 | 35 | 7.43 | 6.20 | 3.66 | 3.71 |

20 | 30 | 5.24 | 5.00 | 3.99 | 2.19 |

20 | 25 | 3.61 | 3.37 | 4.35 | 2.21 |

20 | 20 | 2.88 | 2.61 | 4.23 | 2.38 |

**Figure 6**

**Figure 6**

^{ 16,17 }especially cross-sectionally, as is important when assessing a new patient,

^{ 18,19 }means that perimetry will remain an essential tool in glaucoma management for the foreseeable future. However, its limitations are substantial. The test–retest variability can be high, owing to the necessity of using short sequences of stimulus presentation per tested location driven by the need to maintain acceptably short test durations. This makes detection and quantification of both damage and progression challenging, requiring repeated testing over a prolonged period and hence delaying assignment of appropriate management strategies. Any strategies that can reduce this variability without compromising either test duration or the ability to detect localized defects could be useful in both clinical and research settings. This paper provides proof-of-principle that a relatively simple alteration to existing test algorithms, namely performing the Bayesian calculations on a variability-adjusted scale instead of a decibel scale, could reduce variability in certain circumstances, with comparatively minor deleterious effects in other circumstances.

^{ 20 }and SITA

^{ 5 }algorithms implemented in the HFA perimeter both start by measuring the sensitivity at seed points, and use those measurements to derive the first contrast to be tested at neighboring locations. However, if this initial sensitivity guess is higher than the actual sensitivity at a location, because there is a localized glaucomatous defect at the location in question that does not extend to the seed points used, then the algorithm will tend to overestimate the sensitivity and underestimate the depth of that localized defect. The VAA gives more accurate sensitivity measurements (lower RMS error) when the initial guess is 30 dB or above and the true sensitivity is below 20 to 25 dB. As the discrepancy between the initial guess and the true sensitivity increases, so does the benefit of the new algorithm.

^{ 12 }Presenting stimuli of contrast 5 dB, 2 dB, and so on, does not help assess the sensitivity at the location being tested, because any change in response is likely due to light scattering to remaining healthier locations elsewhere in the visual field and so is uninformative of the functional status at the location being tested. Therefore, in this study, the algorithms also were tested after limiting the range of available stimuli to 15 dB or higher, and limiting all pdfs to 12 dB or higher. For the ZEST algorithm, this gave a small but consistent improvement to the variability (as assessed by the RMS errors for a given input true sensitivity). For the VAA, the benefits were larger but less consistent. When the true sensitivity is low, allowing the pdfs to extend down to 28.4 VAS (equivalent to −3 dB) improved efficiency, by allowing the algorithm to converge toward these low sensitivities more rapidly. However, in the rest of the range, which constitutes most pointwise sensitivities in a clinical situation, limiting the pdf at 29.3 VAS (equivalent to 12 dB) improved variability. In the peripheral visual field, a size III stimulus does not cover the entire receptive field of a retinal ganglion cell, and so response saturation will not occur until a greater contrast. Therefore the lower limits for the censored algorithm should be lowered at peripheral locations. However, both the actual sensitivity and the initial guess also would be lower, so the benefits of the VAA should be similar.

^{ 5 }), the formula used to weight the prior pdf based on the initial guess, and assumptions made about the likelihood of false-positive and false-negative responses. Adjusting any of these is likely to cause the performance of an algorithm to improve in some circumstances and worsen in others. For example, introducing a higher probability of false-positive errors worsens the variability achieved when the true sensitivity is low for any of the current clinical testing algorithms, and the magnitude of that worsening may vary between algorithms. In this study, all of these other factors were kept constant among the four algorithms, allowing a fair assessment of the impact of the two main changes being considered, namely variability-adjustment and limiting the range of available contrast levels. As such, this study should be viewed as a “proof of principle” rather than prescribing an optimal algorithm.

^{ 5 }and so it may not be optimal for patients with other pathologies, even though it is commonly used in such situations.

^{ 21–23 }Subject to a few simplifying assumptions, a 100-cd/m

^{2}stimulus presented on a 10-cd/m

^{2}background will appear very similar to a 50-cd/m

^{2}stimulus presented on a 5-cd/m

^{2}background. Furthermore, the dB scale is anchored such that 0 dB represents the highest contrast that can be presented by the perimeter, and although this varies between perimeters, it makes the results easier for practitioners and patients to understand. Finally, the dB scale is easily converted to a linear scale so that the resulting sensitivity values are linearly related to structural measures,

^{ 18,24,25 }whereas values on the VAS scale would have to be converted to decibels first.

**S.K. Gardiner**, None

*. 2005; 46: 2451–2457. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2000; 41: 417–421. [PubMed]*

*Invest Ophthalmol Vis Sci**. 1989; 108: 130–135. [CrossRef] [PubMed]*

*Am J Ophthalmol**. 2008; 92: 569–573. [CrossRef] [PubMed]*

*Br J Ophthalmol**. 1997; 75: 368–375. [CrossRef]*

*Acta Ophthalmol**. 2009; 50: 488–494. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2002; 43: 322–331. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2002; 43: 709–715. [PubMed]*

*Invest Ophthalmol Vis Sci**. 1988; 27: 1030–1037. [CrossRef] [PubMed]*

*Appl Opt**. 1994; 35: 268–280. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2011; 52: 764–771. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. doi:10.1016/j.ophtha.2014.01.020 .*

*Ophthalmology**. Vienna, Austria: R Foundation for Statistical Computing; 2013.*

*R: A Language and Environment for Statistical Computing**. 2003; 44: 4787–4795. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2000; 41: 417–421. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2012; 53: 6939–6946. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2012; 250: 1851–1861. [CrossRef] [PubMed]*

*Graefes Arch Clin Exp Ophthalmol**. 2007; 26: 688–710. [CrossRef] [PubMed]*

*Prog Retin Eye Res**. 2012; 53: 2740–2748. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. New York: Churchill Livingstone; 1991.*

*Manual of Visual Fields**. Leipzig: Breitkopf und Härtel; 1860.*

*Elemente der Psychophysik**. St. Louis, MO: C. V. Mosby; 1981: 652–655.*

*Adler's Physiology of the Eye**. 2010; 23: 155–171. [CrossRef] [PubMed]*

*Seeing Perceiving**. 2000; 41: 1774–1782. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2010; 29: 249–271. [CrossRef] [PubMed]*

*Prog Retin Eye Res*