November 2003
Volume 44, Issue 11
Free
Glaucoma  |   November 2003
Properties of Perimetric Threshold Estimates from Full Threshold, ZEST, and SITA-like Strategies, as Determined by Computer Simulation
Author Affiliations
  • Andrew Turpin
    From the Department of Computing, Curtin University of Technology, Perth, Western Australia, Australia; the
  • Allison M. McKendrick
    School of Psychology, University of Western Australia, Crawley, Western Australia, Australia;
  • Chris A. Johnson
    Legacy Clinical Research and Technology Center, Discoveries in Sight, Devers Eye Institute, Portland, Oregon; and the
  • Algis J. Vingrys
    Department of Optometry and Vision Sciences, University of Melbourne, Victoria, Australia.
Investigative Ophthalmology & Visual Science November 2003, Vol.44, 4787-4795. doi:10.1167/iovs.03-0023
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Andrew Turpin, Allison M. McKendrick, Chris A. Johnson, Algis J. Vingrys; Properties of Perimetric Threshold Estimates from Full Threshold, ZEST, and SITA-like Strategies, as Determined by Computer Simulation. Invest. Ophthalmol. Vis. Sci. 2003;44(11):4787-4795. doi: 10.1167/iovs.03-0023.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

purpose. To investigate the accuracy and precision of threshold estimates returned by two Bayesian perimetric strategies, staircase-QUEST or SQ (a Swedish interactive threshold algorithm [SITA]-like strategy) and ZEST (zippy estimation by sequential testing), and to compare these measures with those of the full-threshold (FT) algorithm.

methods. A computerized visual field simulation model was developed to compare the performance (accuracy, precision, and number of presentations) of the three algorithms. SQ implemented aspects of the SITA algorithm that are in the public domain. The simulation was tested by using standard automated perimetry (SAP) visual field data from 265 normal subjects and 163 observers with glaucomatous visual field loss and by exploring the effect of response variability and response errors on algorithm performance.

results. SQ was faster than FT or ZEST, with a comparable mean error when simulating field tests on patients. Point-wise analysis revealed similar error and standard deviation of error as a function of threshold for FT and SQ. If the initial estimate of threshold for either procedure was incorrect, the means and standard deviations of the error increased markedly. ZEST produced more accurate thresholds than did the other two strategies when the initial estimate was removed from the true threshold.

conclusions. When simulated patients made errors, the accuracy and precision of sensitivity estimates were poor when the initial estimate of threshold either overestimated or underestimated the true threshold. This was particularly so for FT and SQ. ZEST demonstrated more consistent error properties than the other two measures.

The objective of standard automated perimetry (SAP) is to obtain accurate and precise threshold estimates from a large number of visual field locations within a reasonable test time. The ideal perimetric algorithm should also be robust to patient errors. Several approaches have been applied to perimetry in an attempt to strike an acceptable balance between test time and accuracy. Early versions of algorithms for automated perimetry were based on staircase threshold strategies. The full threshold (FT) strategy used by the Humphrey Field Analyzer (Carl Zeiss Meditec, Dublin, CA) became an accepted procedure for SAP and is used in most glaucoma-related clinical trials. Staircase strategies are computationally simple and have been studied in detail using both computer simulation and clinical studies. 1 2 3  
In recent years, a new generation of perimetric test algorithms based on maximum-likelihood principles have been developed. One such approach is the family of Swedish interactive threshold algorithms (SITAs) that are commercially available for the Humphrey Field Analyzer. The SITA strategy is a hybrid of both staircase and maximum-likelihood threshold procedures and was developed specifically for automated perimetry. 4 5 6 SITA Standard reduces the test time for assessment of the central 30° of the visual field by up to 50% compared with the test times required by the FT strategy. 5 6 The reduction in test duration is achieved in several ways 4 7 : (1) more efficient threshold estimation based on maximum likelihood principles results in a reduced number of presentations; (2) false-positive responses are estimated without the use of catch trials; (3) the interstimulus interval is altered to match the patient’s speed of response; and (4) SITA repeats testing if the threshold returned is more than 12 dB from an initial estimate of threshold, whereas FT repeats if the threshold is more than 4 dB from the initial estimate. 
Another maximum-likelihood test procedure that has been applied successfully to perimetry is ZEST (zippy estimation by sequential testing). 8 9 10 11 ZEST has been shown to determine efficiently the thresholds for frequency-doubling technology (FDT) perimetry 10 11 and is available commercially for SAP in the Medmont perimeter (Medmont Pty. Ltd., Camberwell, Victoria, Australia) and in the Humphrey Matrix, a new FDT perimeter. As it is based on maximum likelihood principles, ZEST shares some features with SITA but is computationally simpler. 
Given the marked reduction in test times afforded by newer threshold strategies, there is strong motivation for them to replace the FT strategy as the standard procedure both in clinical practice and research. SITA Standard has been thoroughly evaluated in clinical populations and has been found to return thresholds that are qualitatively comparable to FT. 5 6 12 13 14 SITA standard has also been shown to have lower global test-retest variability in comparison with FT estimates. 14 15 16 However, newer strategies are computationally more complicated than first-generation staircase strategies, and a full understanding of their performance may not be revealed by such global comparisons. This is evidenced by a recent study by Artes et al. 16 which provides a detailed examination of differences between FT and SITA strategies and reveals that the differences in threshold estimates returned by these procedures vary with threshold in a nonlinear manner. 
Although FT is used as a quasistandard, threshold estimates returned by this procedure are often highly variable, particularly with increasing deficit depth. 1 13 14 16 This lack of precision means that many repeated tests are required to obtain a reliable threshold estimate, which has practical limitations when testing patients. Furthermore, it is impossible to evaluate the accuracy of the mean threshold estimate obtained from repeated testing, because a patient’s true threshold is not known. Hence, thresholds returned by FT are not an adequate standard against which to measure the accuracy of other strategies. Computer simulation of visual field assessment is the ideal tool for evaluating test performance and has been successfully applied to the study of perimetric algorithms. 2 4 11  
This study was designed to investigate the accuracy, precision, and number of presentations required of two recent algorithms (ZEST and staircase-QUEST, a SITA-like approach). The FT algorithm was evaluated for comparison. Staircase-QUEST (SQ) implements those aspects of the SITA family of algorithms that are available within the public domain. We explored the performance of these algorithms, first by using a visual field approach, designed to be similar to clinical visual field assessment for both normal and glaucomatous visual fields. We also evaluated the performance of each of the test strategies as a function of true threshold for specific initial threshold estimates. This enabled evaluation of the performance of the algorithms in all situations, rather than simply in cases that commonly occur in practice. By focusing on all aspects of an algorithm’s performance, subtle but clinically relevant differences can be revealed. 
Methods
Overview of the Computer Simulation
We used the same computer simulation procedure used previously to develop efficient threshold strategies for FDT perimetry, 11 except that the present investigation was applied to SAP. The simulation reads an input threshold and then applies a test procedure. In the simplest mode, the simulation assumes an observer without response variability, such that any stimulus presented at a lower luminance (higher decibels) than the input threshold cannot be seen (“no” response). Likewise, any stimulus presented at a higher luminance (lower decibels) results in a “yes” response. If the stimulus is presented at a luminance equal to the input threshold, then a “yes” or “no” response is chosen with equal probability. The procedure is run to completion and a threshold estimate output. The output threshold is compared with the input threshold to determine error, and the number of presentations required is also assessed. 
Each test procedure was assessed using the observer without response variability described earlier and two additional simulated observer groups: low-variability and high-variability observers. For these observers both response variability and patient errors were incorporated in the simulation. Response variability was simulated by repeated sampling of a Gaussian distribution with a mean equal to the input threshold. The standard deviation of the Gaussian distribution was set to 1.0 dB for low-variability observers and 2.0 dB for high-variability observers. False-positive and negative rates were incorporated as a probability that the subject would respond yes or no irrespective of what stimulus was presented. False-positive and -negative rates of 15% were used for low-variability observers and 30% for high-variability observers. 
Visual Field Simulation
Test procedures were run on visual fields simulating patient testing. The input visual fields comprised 265 normal and 163 glaucomatous visual fields (24-2 FT strategy) supplied by one of the authors (CAJ). Written informed consent was obtained from all subjects, in accordance with the Declaration of Helsinki. The mean age of the normal patients was 47 ± 16 (SD) years, and the mean age of patients with glaucoma was 61 ± 13 years. The glaucomatous visual fields ranged from mild to severe visual field damage (median mean deviation [MD] = −1.81 dB, 5th percentile = +2.14 dB, 95th percentile = −22.55 dB). 
All three test procedures require an initial estimate of threshold at each location of the visual field. We followed the approach of the Humphrey Field Analyzer 24-2 “growth pattern” for determining these initial estimates. 17 With this approach, four seed locations have the threshold estimated by using the mean sensitivity of 541 normal patients as a starting value. These four locations are marked A in Figure 1 , which shows a 24-2 stimulus presentation pattern in the format for a left eye. Once these four locations have been tested, their threshold values are used as the initial estimate for their immediate neighbors—points labeled B in Figure 1 . Remaining points derive their initial estimates by averaging their immediate neighbors that have already been tested. The averaging process is restricted so that it does not cross the horizontal midline, but it may cross the vertical midline. The simulation assumed that all A locations were fully determined before beginning any B locations. Similarly, all B locations were determined before commencing C locations and all C’s completed before commencing any of the locations labeled D. 
Point-wise Simulation for a Specific Initial Estimate
In addition to the visual field approach we ran the test procedures 1000 times on single locations with input thresholds ranging from 0 to 40 dB in 1-dB steps. Procedures were run assuming the three patient variability models (variability here refers to both response variability and patient errors): no, low, and high variability. Test strategy performance was assessed across the range of possible true thresholds (0–40 dB) for initial estimates of 10, 20, and 30 dB. 
Test Procedures
Full-Threshold Algorithm.
The FT algorithm was based on that of the Humphrey Field Analyzer. 17 It consists of a staircase procedure that begins with 4-dB luminance changes until the first response reversal (seeing to nonseeing or vice versa). After the first response reversal, the step size is reduced to 2 dB. The procedure terminates after two reversals, and the threshold estimate is the last-seen intensity. If the difference between the measured threshold and the initial estimate is greater than 4 dB then a second staircase is initiated. 17 The current estimate is used to derive the starting value for the second staircase. In cases in which a second staircase was initiated, our simulation reported the threshold estimate as the mean of the two staircase results. 
The commercial instrument additionally doubly determines 10 locations (the four seed locations and six additional locations) to determine short-term fluctuation. 17 We did not implement these double determinations, because we are determining precision by replicating the simulation multiple times. Hence, FT assessment using the HFA requires, on average, 50 to 60 more presentations per visual field than reported herein. 
Zippy Estimation by Sequential Testing.
Our ZEST implementation within the computer simulation was similar to the one we have described previously. 11 The ZEST procedure is based on a maximum-likelihood determination described elsewhere. 8 9 For each stimulus location, an initial probability density function (pdf) is defined that states, for each possible threshold, the probability that any patient will have that threshold (after adjusting for normal aging effects). We used the combined pdf approach recommended by Vingrys and Pianta, 9 where the pdf is a weighted combination of normal and abnormal thresholds. The normal pdf gives a probability for each possible patient threshold, assuming that the location is “normal,” whereas the abnormal pdf gives probabilities assuming the location is “abnormal.” Our normal and abnormal pdfs were derived from empiric data as shown in Figures 2A and 2B . The patient set used to determine these pdfs consisted of 541 normal and 315 glaucomatous visual fields and was different from the input to the simulation. For each location, the lower 95th percentile for normal performance was determined from the 541 normal visual fields. The abnormal pdf was derived from the 315 patients with glaucoma by including only those thresholds that were below the lower 95% percentile for norma subjects. For both normal and abnormal pdfs, threshold estimates were pooled across all locations. For each test location, the normal pdf was adjusted along the threshold axis so that its mode was at the initial estimate of threshold, and then the abnormal and normal pdfs were combined in a ratio of 1:4. A small nonzero pedestal was added to the normal pdf, to ensure that all thresholds were represented with nonzero probability in the combined pdf. This is shown in Figure 2C , for an initial estimate of 32 dB. 
The ZEST procedure presents the first stimulus at a luminance equal to the mean of the initial pdf and then uses the subject’s response (seen or not seen) to modify the pdf. To generate the new pdf, the old pdf is multiplied by a likelihood function (similar to a frequency-of-seeing curve), which represents the likelihood that a subject will see a particular stimulus. An expanded description of this process is provided in Turpin et al. 11 The likelihood function used in our simulations is shown in Figure 2D . After the determination of the new pdf, the new mean is calculated and the stimulus intensity equal to that mean is presented. The process is repeated until a termination criterion is met (in this case, standard deviation of pdf <1.5 dB). The output threshold is the mean of the final pdf. 
Staircase-QUEST.
The staircase-QUEST (SQ) algorithm was designed to mimic the primary functions of SITA. 4 The SITA approach to determining thresholds consists of four components:
  1.  
    An algorithm for estimating an initial estimate of threshold at each location of the visual field based on a “growth-pattern.”
  2.  
    An algorithm for determining the threshold at each location in the visual field based on a hybrid staircase-QUEST procedure.
  3.  
    A false-positive estimation technique based on response time.
  4.  
    A postprocessing phase, in which the information from component 3 is used to modify the results of component 2.
Our SQ algorithm outputs the results of components 1 and 2, before postprocessing. We did not implement components 3 and 4, because aspects of this postprocessing are not available in the literature. 
The SQ algorithm proceeds as follows. For each location, the stimulus is presented at an initial estimated threshold value. Subsequent stimulus intensities are determined as for the FT algorithm—that is, using a staircase procedure with initial step sizes of 4 dB followed by 2 dB after the first reversal. However SQ differs from FT in determining when to terminate the staircase and in the final threshold estimate. 
In conjunction with the staircase, two probability functions (pfs) are maintained. (We do not use the term pdf as for ZEST, because the area under the SITA probability functions appear not to be one. See Figure 1 in Ref. 4 .) One pf gives the probability for each possible patient threshold, assuming that the location is abnormal, whereas the other maintains probabilities for thresholds that are normal. We begin with the same normal and abnormal pfs as in the ZEST procedure (Figs. 2A 2B) . Before the sequence of stimulus presentations begins for each location, the normal pf is translated along the threshold axis so that its mode aligns with the initial estimate for that particular location. 
After each presentation, new pfs are determined based on the previous patient response (seen or not seen). Similar to ZEST, the rule for generating the new pf is to multiply the old pf by a likelihood function, but the 50% location of the likelihood function is aligned with the presented staircase value, not the mean or mode of the pf. Both pfs were maintained independently. The same likelihood function was used as for ZEST (Fig. 2D) . There are two termination rules for SQ, which are the same as those used for SITA. The staircase terminates when either one of the pfs has a sufficiently small variance, or if two reversals are achieved in the staircase (in this latter case, the termination rule is the same as FT). SQ reports the most likely mode of the two pfs as the threshold for the location, irrespective of the basis of staircase termination. 
The SITA algorithm uses the error-related factor (ERF) 4 to determine whether the variance of either pf is sufficiently narrow to terminate the staircase procedure, where  
\[\mathrm{ERF}{=}0.19{+}\mathrm{sqrt(variance)}{-}3/70{\times}\mathrm{mode}\]
Full details of the derivation of the formula appear at https://www.computing.edu.au/∼andrew/barramundi/sap.html. 
This formulation of ERF allows for more error (increased variance) when thresholds are close to normal and requires smaller variances in pf when thresholds are abnormal. According to simulations performed by the developers of the SITA Standard algorithm, terminating the staircase when ERF reaches 0.69 works well in practice. 4 Similar to the SITA developers, we tuned ERF in our experiments to obtain the best performance from SQ, and report herein experiments using an ERF of 0.70. 
If the threshold estimate returned from SQ is more than 12 dB from the initial estimate a second staircase is initiated. This staircase is commenced at the current threshold estimate. The mode of the normal pf is also moved to the current estimate. This retest rule is based on that used by SITA. 4  
Results
Visual Field Performance
The results of the visual field procedures for patient groups with no, low, and high variability are compared in Figure 3 . The leftmost panel shows the mean number of presentations plotted against the mean error across the field. This figure demonstrates that, on average, SQ required fewer presentations than the other two procedures. The number of presentations for FT was approximately 1 presentation fewer than previously reported, 4 5 6 because double determinations to estimate short-term fluctuation were not included. When simulated patients had no variability, SQ and ZEST had similar mean errors and standard deviations of error. FT, however, underestimated threshold by approximately 1.5 dB, which probably resulted largely from FT’s reporting the last-seen stimulus as the estimate. In patients with low variability, both SQ and ZEST were slightly more accurate than FT (approximately 1 dB) but the precision of the procedures (standard deviation of the error) was approximately equivalent. Both the mean error across the field and its standard deviation increased for all three procedures when patients had high variability. 
Several clinical studies have reported a difference in the threshold estimates returned by SITA and FT, with SITA returning estimates that are, on average, approximately 1 dB higher than those of FT. 5 14 16 18 Because FT returns the last-seen stimulus as the threshold estimate, a difference of 1 dB should be expected, irrespective of threshold. Artes et al. 16 recently demonstrated that the differences between SITA and FT vary with threshold, being highest for intermediate sensitivities. It has also been argued that differences in threshold estimates between the strategies may arise in part due to reduced fatigue for the shorter examinations produced by SITA. 19 To explore this issue within our simulation model, the difference between SQ and FT is plotted as a function of threshold in Figure 4 . These data were extracted from the visual field simulations. It can be seen that for most of the range of thresholds, SQ returned estimates of higher sensitivity than FT and that the magnitude of the difference was approximately 1 dB. 
Performance as a Function of Threshold for Displaced Initial Estimates
We evaluated the accuracy and number of presentations required for each of the test strategies as a function of input threshold for specific initial estimates of threshold. Figures 5 6 and 7 show the performance of each of the test strategies as a function of true threshold, where the initial estimate for each of the algorithms is 10, 20, and 30 dB, respectively. This results in measures of mean presentations and error when the threshold is initially either underestimated or overestimated, such as may arise on the edge of a scotoma. 
The top panel of each figure shows the mean number of presentations required for each input threshold for simulated patients with no, low, and high variabilities. The middle panel shows the mean error and the bottom the standard deviation of the error. The dashed vertical line in each figure indicates the initial estimate for each of the procedures. 
Inspection of the upper panels of Figures 5 6 and 7 reveals that the number of presentations necessary to terminate the procedures increased with the level of inaccuracy of the initial estimate. This occurred more rapidly for FT than for SQ; hence, for any particular initial estimate, SQ is quick to terminate over a wider range of actual thresholds. When the true threshold was close to the initial estimate, ZEST was slower than the other two procedures; however, when the initial estimate was in error, ZEST used a number of presentations comparable to the number in SQ. 
Inspection of the middle panels of Figures 5 6 and 7 reveals that the error distribution for SQ and FT was similar and is rather symmetrical about the initial estimate in patients with low- or high-variability. If the initial estimate either overestimated or underestimated true sensitivity, the mean error increased. Furthermore, the standard deviation of the error increased markedly. In contrast, the error performance of ZEST was more robust, with lower mean errors when the initial estimate was incorrect than in the other two strategies. For observers with low variability, the standard deviation of the error for ZEST was much lower and more consistent across the range of thresholds than were those of the other two procedures. 
Discussion
Computer simulation of perimetric strategies allows investigation of the accuracy of test procedures that is not possible in studies of human observers. In several studies, both normal and glaucomatous observers been used to explore the differences in thresholds returned by recent algorithms compared to FT 5 6 12 13 14 16 ; however, the threshold estimate returned by FT can be inaccurate and imprecise. 1 13 14 16 Such comparisons are essential if patients or clinical trials are to be exchanged from one test procedure to another, but are of restricted utility in understanding the limitations of the procedures for accurately and precisely determining thresholds—essential knowledge for detection of visual field loss and its progression. 
In our simulation, SQ was based on the details of SITA that appear in the public domain. Our purpose was to demonstrate the underlying principles of the hybrid staircase-Bayesian approach incorporated in SITA. SQ is not the same as SITA. First, the pf used is not the same as in the commercial version, and second, SITA incorporates postprocessing analysis. The postprocessing aspects of SITA are likely to be equally applicable to those of any test strategy. SITA was developed to have error properties similar to those of FT, but to return thresholds using fewer stimulus presentations. 4 5 19 SQ meets these development goals, and so we assume that it is likely to be representative of the underlying principles of SITA. One further aspect of SITA that is not incorporated in SQ is that SITA alters pfs during the test based on the pfs of neighboring values. The details of these alterations are not published in the literature, and therefore we could not incorporate them in our SQ simulations. 
Inspection of Figure 4 shows that for simulated patients with low variability, the difference in the mean error across the field between SQ and FT was approximately 1 dB in normal observers and in those with glaucoma. This compares favorably with the approximate 1 dB difference reported between SITA and FT in clinical studies. 5 14 16 18 It has been suggested that the difference between thresholds returned by SITA and FT may be caused in part by a reduction in fatigue in the shorter SITA examination. 19 However, several studies have argued that factors other than fatigue are more likely to explain the difference. 15 16 18 In addition, our simulation results suggest that the differences between SITA and FT estimates are unlikely to be due to differential effects of fatigue, but rather to the mechanics of the test algorithms. FT returned the last-seen presentation, whereas SQ/SITA returned the most likely mode of the two pfs used in the procedure. As ZEST returned the mean of the final pdf, which provided a less biased estimate than the mode, 8 a slightly different threshold again was returned by ZEST, because of this factor alone. Inspection of Figures 4 5 6 7 reveals that the differences in error between SQ and FT varied with threshold, a finding that is broadly compatible with that of Artes et al. 16  
The performance of both ZEST and SQ depends in part on the choice of pdfs, the choice of likelihood function, and the particular termination rules imposed. We used empiric pdfs based on normal and abnormal thresholds measured for SAP and chose to use a hybrid normal+abnormal pdf for ZEST, because results in previous studies suggest this approach works well. 9 Thresholds were pooled across locations to form the normal and abnormal pdfs resulting in a broader pdf than if locations were treated separately. Initial inspection of location-specific pdfs revealed that the shape of abnormal pdfs was highly aberrant in some locations because of sampling issues—hence, the decision to pool across locations. The broader pdfs produced by pooling create a more uniform combined pdf that increases the number of presentations required for ZEST to terminate with marginal improvements in accuracy and precision. 8 10 Although our pdfs were based on empiric thresholds, the specific derivation of pdfs for Bayesian test strategies is somewhat arbitrary. These pdfs may be different from those used in both the commercial application of ZEST on the Medmont perimeter and SITA in the Humphrey Field Analyzer; however, they were based on a large number of empiric thresholds and so may be assumed to represent reasonably the underlying population distribution of thresholds. 
The likelihood function used within the ZEST and SQ procedures affects both the spread of errors and the number of trials needed to reduce the errors to an acceptable level. 8 20 The likelihood function used in these experiments was the discrete version of a cumulative Gaussian with a standard deviation of 1.5 dB. This slope is similar to that found for empiric frequency-of-seeing curves measured for SAP in normal observers. 21 We also evaluated numerous other likelihood functions within the simulator and found that this function resulted in SQ’s terminating with similar average presentations and precision to that reported for SITA. 4 5 6 We maintained the same likelihood function for ZEST to facilitate comparison between the mechanics of the procedures. 
Termination rules for SQ were chosen to be the same as those for SITA: SQ ends by using a dynamic termination criterion based on whether the spread of the pf becomes sufficiently narrow, or if two reversals are achieved in the staircase. It is also possible to terminate adaptive procedures after a fixed number of presentations which has been shown to result in errors similar to those obtained using a dynamic criterion. 20 We chose a dynamic termination criterion for ZEST to keep it similar to SQ. The parameters chosen for each of pdf, likelihood function, and termination criterion may be suboptimal; however, optimizing SQ and ZEST falls beyond the scope of this study. 
A difference between the simulation and human performance is that our variability models (no, low, and high variability) were kept fixed across the visual field. These variability models incorporate both response variability and patients’ errors. Response variability is known to increase with deficit depth. 21 22 23 Hence, in a given patient responses may range from having no variability to high variability at different locations within their visual field. We present three variability conditions chosen to represent the end points of the range of response variability and patient response errors: no errors and 30% false-positive and false-negative responses (a commonly used cutoff criterion for acceptable performance), as well as the middle of this range, and assess performance for all possible stimulus levels for each of these conditions (Figs. 5 6 7) . An alternate approach would have been to increase response variability with increasing deficit depth. Although this alternate approach may more closely represent average clinical performance, the approach taken provides far greater information regarding the underlying performance of the three algorithms and their tolerance to variability, enabling the assessment of the algorithms for situations that are uncommon but still occur at times (for example, locations in which threshold is normal but the subject’s responses have high variability). In practice, the results with any individual patient may be a hybrid of the three variability models presented and can be determined from the data shown in Figures 5 6 7 . It is also possible that our choice of having equivalent numbers of false positives and false negatives is not representative of typical performance. Indeed, typical patients may be likely to have either 15% false-positive or false-negative responses, but not both. It is to be expected that significant response biases in one direction only (for example false positives) will introduce a more severe systematic error than that shown in our low-variability group, but may reduce the standard deviation of the error. 
For all the test strategies, if the initial estimate for the procedure is close to the true threshold then the procedures are fast and accurate. This is likely to happen in most real cases, because of the preponderance of normal thresholds and the use of the growth pattern to determine the initial estimate. This is reflected in the visual field simulations shown in Figure 3 , which demonstrates small absolute errors when averaged across the visual field for all test procedures. However, as the point-wise analysis shows, in locations in which the initial estimate is wrong (either an underestimate or overestimate) the procedures can take a long time and have reduced accuracy. This is especially true of SQ and FT, despite the fact that these procedures incorporate an error-checking retest strategy. For retested locations the HFA FT procedure provides the results of both determinations with no interpretation instructions. In these situations we chose to take an average. For SQ, locations are retested only if more than 12 dB from the initial estimate of threshold. This relaxed retest policy favors fewer presentations over improved accuracy and precision. 
ZEST clearly outperforms SQ and FT when the initial estimate is removed from the true threshold. In practice, this occurs in a minority of locations (such as on the edge of a scotoma) however, determining accurate and repeatable thresholds in these locations is essential for monitoring progression of visual field loss. ZEST shows more consistent error response properties, irrespective of initial estimate and deficit depth, than do the other two procedures, although it is slower to terminate. The test time for ZEST can be decreased by altering the termination rule (for example, terminating after four presentations makes it comparable to SQ); however, this is achieved at the expense of accuracy and precision. 
Both SQ and FT (and to a lesser extent ZEST) have similar limitations: when patients make response errors, both the mean error and the standard deviation of the error increase when the initial estimate is not close to the true threshold. It is possible to compensate for the presence of a systematic error. This is not the case for SQ and FT, because not only does the mean error increase with greater disparity of the initial estimate but the standard deviation of the error also increases. ZEST performs better than the other two strategies under these conditions; however, the errors still increase markedly when patients respond unreliably. Although SITA has provided welcome benefits over FT in reducing test time, further improvements in the accuracy and precision of visual field assessment should be possible. For better detection of visual field loss and particularly for better monitoring of progression, test procedures that reduce both the mean error and standard deviation of the error for locations with abnormal thresholds are needed. 
 
Figure 1.
 
A 24-2 growth pattern used in the simulations to determine the initial estimate for each location based on neighboring values.
Figure 1.
 
A 24-2 growth pattern used in the simulations to determine the initial estimate for each location based on neighboring values.
Figure 2.
 
The pdfs and likelihood function used for ZEST and SQ. (A) The abnormal pdf, which gives a probability for each possible patient threshold assuming that the location is abnormal; (B) the normal pdf, which gives probabilities assuming the location is normal; (C) the combined pdf used for ZEST for an initial estimate of 32 dB; and (D) the likelihood function used in the simulations.
Figure 2.
 
The pdfs and likelihood function used for ZEST and SQ. (A) The abnormal pdf, which gives a probability for each possible patient threshold assuming that the location is abnormal; (B) the normal pdf, which gives probabilities assuming the location is normal; (C) the combined pdf used for ZEST for an initial estimate of 32 dB; and (D) the likelihood function used in the simulations.
Figure 3.
 
Performance of the test procedures averaged across the visual field, for patients with (A) no, (B) low, and (C) high variability. Left: mean number of presentations plotted against the mean error. Filled symbols: normal patients; unfilled symbols: patients with glaucoma. Right: standard deviations for the number of presentations and error for each of the test strategies and patient groups.
Figure 3.
 
Performance of the test procedures averaged across the visual field, for patients with (A) no, (B) low, and (C) high variability. Left: mean number of presentations plotted against the mean error. Filled symbols: normal patients; unfilled symbols: patients with glaucoma. Right: standard deviations for the number of presentations and error for each of the test strategies and patient groups.
Figure 4.
 
Difference between SQ and FT as a function of threshold extracted from the visual field data for patients with (A) no, (B) low, and (C) high variability. Left: mean difference; right: standard deviation of the difference.
Figure 4.
 
Difference between SQ and FT as a function of threshold extracted from the visual field data for patients with (A) no, (B) low, and (C) high variability. Left: mean difference; right: standard deviation of the difference.
Figure 5.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 10 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 5.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 10 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 6.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 20 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 6.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 20 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 7.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 30 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 7.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 30 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Heijl, A, Lindgren, A, Lindgren, G. (1989) Test-retest variability in glaucomatous visual fields Am J Ophthalmol 108,130-135 [CrossRef] [PubMed]
Johnson, CA, Chauhan, BC, Shapiro, LR. (1992) Properties of staircase procedures for estimating thresholds in automated perimetry Invest Ophthalmol Vis Sci 33,2966-2974 [PubMed]
Spenceley, SE, Henson, DB. (1996) Visual field test simulation and error in threshold estimation Br J Ophthalmol 80,304-308 [CrossRef] [PubMed]
Bengtsson, B, Olsson, J, Heijl, A, Rootzen, H. (1997) A new generation of algorithms for computerized threshold perimetry Acta Ophthalmol Scand 75,368-375 [PubMed]
Bengtsson, B, Heilj, A, Olsson, J. (1998) Evaluation of a new threshold visual field strategy, SITA, in normal subjects Acta Ophthalmol Scand 76,165-169 [CrossRef] [PubMed]
Bengtsson, B, Heijl, A. (1998) Evaluation of a new perimetric strategy, SITA, in patients with manifest and suspect glaucoma Acta Ophthalmol Scand 76,368-375
Olsson, J, Bengtsson, B, Heijl, A, Rootzen, H. (1997) An improved method to estimate frequency of false positive answers in computerized perimetry Acta Ophthalmol Scand 75,181-183 [PubMed]
King-Smith, PE, Grigsby, SS, Vingrys, AJ, Benes, SC, Supowit, A. (1994) Efficient and unbiased modifications of the Quest threshold method: theory, simulations, experimental evaluation and practical implementation Vision Res 34,885-912 [CrossRef] [PubMed]
Vingrys, AJ, Pianta, M. (1999) A new look at threshold estimation algorithms for automated static perimetry Optom Vis Sci 76,588-595 [CrossRef] [PubMed]
Turpin, A, McKendrick, AM, Johnson, CA, Vingrys, AJ. (2002) Performance of efficient test procedures for frequency doubling technology in normal and glaucomatous eyes Invest Ophthalmol Vis Sci 43,709-715 [PubMed]
Turpin, A, McKendrick, AM, Johnson, CA, Vingrys, AJ. (2002) Development of efficient threshold strategies for frequency doubling technology perimetry using computer simulation Invest Ophthalmol Vis Sci 43,322-331 [PubMed]
Budenz, DL, Rhee, P, Feuer, WJ, McSoley, J, Johnson, CA, Anderson, DR. (2002) Comparison of glaucomatous visual field defects using standard full threshold and Swedish Interactive Threshold algorithms Arch Ophthalmol 120,1136-1141 [CrossRef] [PubMed]
Wild, JM, Pacey, IE, O’Neill, EC, Cunliffe, IA. (1999) The SITA perimetric threshold algorithms in glaucoma Invest Ophthalmol Vis Sci 40,1998-2009 [PubMed]
Wild, JM, Pacey, IE, Hancock, SA, Cunliffe, IA. (1999) Between-algorithm, between-individual, differences in normal perimetric sensitivity: Full Threshold, FASTPAC and SITA Invest Ophthalmol Vis Sci 40,1152-1161 [PubMed]
Shirato, S, Inoue, R, Fukushima, K, Suzuki, Y. (1999) Clinical evaluation of SITA: a new family of perimetric testing strategies Graefes Arch Clin Exp Ophthalmol 237,29-34 [CrossRef] [PubMed]
Artes, PH, Iwase, A, Ohno, Y, Kitazawa, Y, Chauhan, BC. (2002) Properties of perimetric threshold estimates from full threshold, SITA Standard, and SITA Fast strategies Invest Ophthalmol Vis Sci 43,2654-2659 [PubMed]
Anderson, DR, Patella, VM. (1999) Automated Static Perimetry 2nd ed. Mosby-Year Book St. Louis.
Wall, M, Punke, SG, Stickney, TL, Brito, CF, Withrow, KR, Kardon, RH. (2001) SITA Standard in optic neuropathies and hemianopias: a comparison with full threshold testing Invest Ophthalmol Vis Sci 42,528-537 [PubMed]
Bengtsson, B, Heijl, A. (1998) SITA Fast, a new rapid perimetric threshold test: description of methods and evaluation in patients with manifest and suspect glaucoma Acta Ophthalmol Scand 76,431-437 [CrossRef] [PubMed]
Anderson, AJ. (2003) Utility of a dynamic termination criteria in the ZEST adaptive threshold method Vision Res 43,165-170 [CrossRef] [PubMed]
Spry, PGD, Johnson, CA, McKendrick, AM, Turpin, A. (2001) Variability components of standard automated perimetry and frequency-doubling technology perimetry Invest Ophthalmol Vis Sci 44,1404-1410
Henson, DB, Chaudry, S, Artes, PH, Faragher, EB, Ansons, A. (2000) Response variability in the visual field: comparison of optic neuritis, glaucoma, ocular hypertension, and normal eyes Invest Ophthalmol Vis Sci 41,417-421 [PubMed]
Chauhan, BC, Tompkins, JD, Le Blanc, RP, McCormick, TA. (1993) Characteristics of frequency-of-seeing curves in normal subjects, patients with suspected glaucoma, and patients with glaucoma Invest Ophthalmol Vis Sci 34,3534-3540 [PubMed]
Figure 1.
 
A 24-2 growth pattern used in the simulations to determine the initial estimate for each location based on neighboring values.
Figure 1.
 
A 24-2 growth pattern used in the simulations to determine the initial estimate for each location based on neighboring values.
Figure 2.
 
The pdfs and likelihood function used for ZEST and SQ. (A) The abnormal pdf, which gives a probability for each possible patient threshold assuming that the location is abnormal; (B) the normal pdf, which gives probabilities assuming the location is normal; (C) the combined pdf used for ZEST for an initial estimate of 32 dB; and (D) the likelihood function used in the simulations.
Figure 2.
 
The pdfs and likelihood function used for ZEST and SQ. (A) The abnormal pdf, which gives a probability for each possible patient threshold assuming that the location is abnormal; (B) the normal pdf, which gives probabilities assuming the location is normal; (C) the combined pdf used for ZEST for an initial estimate of 32 dB; and (D) the likelihood function used in the simulations.
Figure 3.
 
Performance of the test procedures averaged across the visual field, for patients with (A) no, (B) low, and (C) high variability. Left: mean number of presentations plotted against the mean error. Filled symbols: normal patients; unfilled symbols: patients with glaucoma. Right: standard deviations for the number of presentations and error for each of the test strategies and patient groups.
Figure 3.
 
Performance of the test procedures averaged across the visual field, for patients with (A) no, (B) low, and (C) high variability. Left: mean number of presentations plotted against the mean error. Filled symbols: normal patients; unfilled symbols: patients with glaucoma. Right: standard deviations for the number of presentations and error for each of the test strategies and patient groups.
Figure 4.
 
Difference between SQ and FT as a function of threshold extracted from the visual field data for patients with (A) no, (B) low, and (C) high variability. Left: mean difference; right: standard deviation of the difference.
Figure 4.
 
Difference between SQ and FT as a function of threshold extracted from the visual field data for patients with (A) no, (B) low, and (C) high variability. Left: mean difference; right: standard deviation of the difference.
Figure 5.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 10 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 5.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 10 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 6.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 20 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 6.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 20 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 7.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 30 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
Figure 7.
 
Comparison of the performance (number of presentations and error) of each of the threshold algorithms FT (•), ZEST (○), and SQ (▾) in patients with (A) no, (B) low, and (C) high variability, when the initial estimate is 30 dB. The mean number of presentations, mean error, and standard deviation of error are presented for each input threshold.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×