January 2002
Volume 43, Issue 1
Free
Glaucoma  |   January 2002
Response Time as a Discriminator between True- and False-Positive Responses in Suprathreshold Perimetry
Author Affiliations
  • Paul H. Artes
    From the Academic Department of Ophthalmology, Manchester Royal Eye Hospital, University of Manchester, Manchester, United Kingdom.
  • David McLeod
    From the Academic Department of Ophthalmology, Manchester Royal Eye Hospital, University of Manchester, Manchester, United Kingdom.
  • David B. Henson
    From the Academic Department of Ophthalmology, Manchester Royal Eye Hospital, University of Manchester, Manchester, United Kingdom.
Investigative Ophthalmology & Visual Science January 2002, Vol.43, 129-132. doi:
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Paul H. Artes, David McLeod, David B. Henson; Response Time as a Discriminator between True- and False-Positive Responses in Suprathreshold Perimetry. Invest. Ophthalmol. Vis. Sci. 2002;43(1):129-132.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

purpose. To report on differences between the latency distributions of responses to stimuli and to false-positive catch trials in suprathreshold perimetry. To describe an algorithm for defining response time windows and to report on its performance in discriminating between true- and false-positive responses on the basis of response time (RT).

methods. A sample of 435 largely inexperienced patients underwent suprathreshold visual field examination on a perimeter that was modified to record RTs. Data were analyzed from 60,500 responses to suprathreshold stimuli and from 523 false-positive responses to catch trials.

results. False-positive responses had much more variable latencies than responses to suprathreshold stimuli. An algorithm defining RT windows on the basis of z-transformed individual latency samples correctly identified more than 70% of false-positive responses to catch trials, whereas fewer than 3% of responses to suprathreshold stimuli were classified as false-positive responses.

conclusions. Latency analysis can be used to detect a substantial proportion of false-positive responses in suprathreshold perimetry. Rejection of such responses may increase the reliability of visual field screening by reducing variability and bias in a small but clinically important proportion of patients.

Automated visual field tests are widely used in routine clinical practice to detect and quantify abnormalities of peripheral vision that occur in glaucoma and in neuro-ophthalmic disorders. Most of the current generation of visual field tests present a series of stimuli to the patient, one at a time, and rely on the patient to press a button each time a stimulus is seen. If no response is elicited within the interstimulus interval, it is assumed that the patient did not detect the stimulus. In suprathreshold perimetry, stimuli are presented at intensities calculated to be above the patient’s threshold by a predetermined amount (suprathreshold increment). If the stimulus is detected, it is assumed that there is no significant visual field loss at that location. 
Automated suprathreshold perimetry is mainly used for screening. 1 2 3 4 Patients undergoing such tests are generally inexperienced and may find it difficult to establish and maintain optimal response criteria and to sustain attention. Erroneous responses to stimulus presentations increase the variability of the test result, degrading the ability to correctly classify the status of the patient’s visual field. The patient’s response behavior has traditionally been assessed by randomly interleaving a small proportion of catch trials (usually 3%–5% of presentations) with the test stimuli. The number of responses to false-positive catch trials (during which no stimuli are presented) reveals how likely a patient is to respond without having perceived a stimulus. Owing to the small number of catch trials, estimates of response error rates are notoriously imprecise. Using 14 catch trials, for example, a true error rate of 33% is estimated as between 14% and 57% (95% confidence interval[ CI]). 5 Estimates of patients’ reliability based on catch trials are poor predictors of test result variability, and their clinical usefulness has been brought into question by other research groups. 6 7  
Olsson et al. 8 have proposed a more precise measure of false-positive rate based on analysis of response times (RTs)—the time between the onset of the stimulus and the patient’s response. (They used the term “reaction time,” although it is not clear whether patients were urged to respond as quickly as possible to the stimuli. In keeping with established practice, patients were not instructed to respond rapidly in our study. We therefore prefer to use “response time” rather than “reaction time,” because the latter has a different meaning in the literature. 9 ) By eliminating the need for false-positive catch trials, the technique of Olsson et al. also contributes to the reduction in test time achieved with the Swedish Interactive Threshold Algorithm (SITA) of the Humphrey Visual Field Analyzer (Humphrey Instruments, San Leandro, CA). 10 Although the assumptions underlying the algorithm of Olsson et al. appear plausible, they have not yet been validated, and the paper of Olsson et al. 8 did not report the proportion of false-positive responses detected by their analysis. 
This article reports on the RT distributions of true- and false-positive responses in a large sample of perimetrically inexperienced patients examined with a suprathreshold strategy. It describes an algorithm that estimates the typical time frame for a patient’s responses (RT window) and reports on the proportion of false-positive catch trial responses outside this interval. It proposes that the quality of visual field data can be improved by rejecting responses with latencies outside the RT window and by re-examining the respective locations. 
Methods
Instrument
Suprathreshold perimetry was performed using a Henson Pro 5000 perimeter (Tinsley Instruments, Croydon, UK). This is a hemispherical bowl perimeter driven by a computer. The background luminance was 3.1 candelas (cd)/m2. Luminance–increment stimuli are produced by light-emitting diodes subtending 0.5° visual angle (approximately Goldmann size III). RTs (time between onset of stimulus presentation and click of the response button) were measured to an accuracy of 1 msec with a timer board (PC214E; Amplicon Ltd., Brighton, UK). The software of the perimeter was modified so that data for each presentation (x- and y-coordinates, stimulus intensity, RT) were recorded to a file. 
Patients and Data Collection
Data were collected from 435 patients (mean age, 45 years; range, 12–81), attending a Manchester city-center optometric practice for routine eye care. The only selection criterion was clinical need for visual field screening, based on risk factors for glaucoma or neurologic disease. Most patients had had no experience with automated perimetry. The tests were administered by seven optometric assistants, and patients were instructed using the conventional directions for automated perimetry. 11 No instructions were given regarding the speed of the response, and neither patients nor optometric assistants were aware that RTs were being recorded. Our study followed the tenets of the Declaration of Helsinki, in that the research was free of any risks and no additional burden was placed on the patients. The complete sample contained data from 976 visual field tests, yielding RTs from 60,500 responses to suprathreshold stimuli and from 523 false-positive responses to catch trials. A total of 403 patients completed the examination of both eyes. The mean duration of the test was 3.81 minutes per eye (range, 3.23–7.93). 
Suprathreshold Visual Field Test
The stimulus matrix consisted of 68 locations distributed over the central 25° of the patient’s visual field. The interstimulus interval was 1490 msec, and approximately 25% false-positive and false-negative catch trials were randomly interleaved with the suprathreshold stimuli. The spatial sequence of presentations was randomized. No acoustic warning or feedback signals were provided. After a brief demonstration phase at the onset of the test, the instrument would estimate the general height (GH) of the patient’s visual field according to a previously described algorithm. 12 In brief, six stimuli were presented as a 1-dB up/1-dB down staircase at each of four “seed” locations (12.7° from fixation in each visual field quadrant), and the GH was estimated by averaging the staircase levels of those seed locations at which sensitivity was within normal limits. Subsequently, the 50% detection threshold of each test location was predicted from normative values, adjusted according to the GH estimate. During the suprathreshold phase of the test, each location was examined with a stimulus that was presented for 200 msec at an intensity 5 dB brighter than the predicted local threshold. If that stimulus was not detected, the presentation was repeated at a later stage of the test. Visual field locations at which both the initial and the repeat 5-dB suprathreshold stimuli had been missed were classified as defective and re-examined with suprathreshold increments of 8 and 12 dB. 
Analysis
RT data were analyzed from responses to 5-dB suprathreshold stimuli, excluding those locations at which the stimulus was missed at two presentations and that were consequently flagged as defective. To derive the RT distributions for responses to suprathreshold stimuli and to false-positive catch trials, all available data were used. For the summary statistics and comparisons of right and left eye data, only results of tests of patients who completed the examination of both eyes were analyzed. If these patients had repeated a visual field test, only data of the last examination were included. 
Results
Across the population, the mean false-positive response rate to catch trials was 3.2%, ranging from 0% to 53%. In 65% of visual field tests no false-positive responses were made to catch trials. In 14% of tests, the false-positive response rate was higher than 10%, and in 1.5% of tests, the false-positive response rate was greater than 25%. False-positive response rates did not vary with the patient’s age (Spearman’s R = 0.326, P = 0.51). At locations classified as normal by the test, the probability of detecting the 5-dB suprathreshold stimulus (number of stimuli seen/presented) was 96%, independent of eccentricity out to the test limit of 25°. 
RT distributions varied greatly among patients. Across the sample of patients, individual median RTs ranged from 316 to 908 msec with a group mean of 451 msec (Fig. 1) . Individual interquartile ranges of RTs extended from 41 to 422 msec, with a group mean of 108 msec. Interquartile ranges were related to median RTs (Pearson r = 0.74, P < 0.001). Median RTs decreased slightly with age (9 msec per decade, 95% CI 4–13 msec per decade; P < 0.001), but the correlation was poor (Pearson r = 0.18). Median RTs from the right and left eyes of individual patients were highly related (Pearson r = 0.72, P < 0.001). Tests of the left eye, which were always performed last, yielded responses that were, on average, 12 msec faster (P < 0.001, paired t-test) and somewhat less variable (mean difference of individual interquartile range 9 msec, P < 0.001, paired t-test) than those of the right eyes. The mean false-positive rate of the left eye results (2.8%) was slightly lower than that of the right eye results (3.4%, P = 0.103, paired t-test). 
Figure 2 gives an example of RTs from a patient with a high false-positive rate. The RT distributions were usually positively skewed. A square-root transformation was applied to reduce this skew and to separate the main body of data from early or late outliers. To compensate for the large between-subject variability in average latency and dispersion, the square-rooted RTs (of stimulus as well as catch trial responses) were standardized to z-scores using the mean and SD of the stimulus RTs from each visual field test (Fig. 3) . The ratio between the proportions of false-positive and stimulus responses (i.e., the relative likelihood of a false-positive response) increased dramatically with positive or negative deviations from zero on the z-scale. Less than 5% of stimulus responses had latencies outside the interval of ±2 z, compared with more than 60% of false-positive responses. 
If responses were classified as suspect false-positives simply on the basis of z-scored RTs, the proportion of such responses would be approximately constant throughout the population of patients. With a cutoff value of ±2 z, for example, the proportion of responses classified as suspect would always be approximately 5%. This approach has poor specificity for the majority of patients who make no false-positive errors and poor sensitivity if the false-positive rate is greater than 5%. Better performance is achieved with an algorithm that determines, iteratively, which values deviate most from the mean of the remaining square-rooted sample. These values are successively removed if their exclusion reduces the sample variance by an amount larger than a predetermined criterion (V crit).  
\[V_{\mathrm{crit}}{=}V_{\mathrm{pre}}-V_{\mathrm{post}}\]
where V crit is the criterion value, V pre is the sample variance before and V post the sample variance after removal of most deviating square-rooted RTs. The response window is then defined by the smallest and largest squared values of the trimmed sample. 
Figure 4 shows the performance of this algorithm at different values of V crit. With a criterion value of 0.5, more than 70% of false-positive responses to catch trials, but less than 3% of stimulus responses, occur outside the RT window. In 35% of tests, no stimulus responses were detected outside the RT window defined by this criterion. In fewer than 5% of tests was the proportion of stimulus responses outside the RT window greater than 10%. 
Discussion
Patient response errors have long been recognized as a source of bias and variability in visual field tests. 13 14 Most current suprathreshold strategies repeat missed presentations to provide some resistance against the impact of false-negative response errors. However, false-positive responses can lead to underestimation of visual field loss. 15 The clinical effectiveness of screening techniques depends on how well they perform on the entire population of the patients subjected to them, including those patients who frequently make erroneous responses. If suprathreshold perimetry can be made more resistant to false-positive response errors, its sensitivity to visual field loss would improve in the small but important proportion of patients who make such responses. 
Latencies for false-alarm responses in detection tasks have been reported to exhibit characteristics different from latencies of true responses. 9 These studies used reaction time paradigms with highly trained observers, low rates of stimulus presentation, and randomized, exponentially distributed interstimulus intervals. Patients examined with suprathreshold perimetry tend to have little experience with demanding psychophysical tests and are not usually urged to respond rapidly. The interstimulus intervals are brief and regular (1000–1600 msec), 16 and patients with little or no visual field loss respond to most presentations. This leads to a high level of stimulus expectation that may, in turn, increase the likelihood of anticipatory false-positive errors. 17  
Olsson et al. 8 reported a new method to estimate the false-positive rate in threshold perimetry, based on the frequency of answers during intervals in which no true responses were expected (“listen time”). Estimates were derived by maximum-likelihood estimation using RT, change in RT and stimulus intensity. Olsson et al. demonstrated that their estimates exhibited much lower between-test variability than the conventional catch-trial estimates. They did not, however, demonstrate the validity of their technique or present data to justify the assumptions it is based on. The algorithm to determine the listen time was not described, and there have been no reports on what proportion of false-positive response errors are detected by this technique. 
Better estimates of error rates do not, per se, improve the test result. The sole use of patient reliability indices is to classify as unreliable those test results that exceed arbitrarily defined cutoff criteria. Faced with such results, the only options open to the clinician are either to base decisions on unreliable evidence or to repeat the test. If suspect responses can be detected by their latencies, the clinical data can be improved at source by re-examining the respective visual field locations during the test. Our data highlight striking differences in the RT distribution between responses to suprathreshold stimuli and false-positive responses to catch trials. When corrected for between-subject variability in average latency and dispersion, the distribution of stimulus RTs is compact and highly peaked, whereas the RT distribution of false-positive responses is much broader. Owing to the high variability between the RT distributions of different patients, classification of suspect responses on the basis of population-based RT windows would be inefficient, disadvantaging patients with long or short mean RTs. Conventional statistical methods for the detection of outliers (e.g., the Grubb test 18 ) could be used to detect suspect responses when the sample of responses is large and the false-positive rate is low. These methods fail when the sample size is small and there is a moderate or large number of false-positives with highly variable RTs. 
The algorithm described herein is an ad hoc solution to this problem. It classifies responses as suspect false-positives if their removal reduces the sample variance by an amount greater than a predetermined criterion. The optimal criterion value for any particular application can be estimated from empiric data. Small criterion values lead to exclusion of a relatively larger number of responses (Fig. 4) . With a criterion value of 0.5, more than 70% of false-positive catch trial responses in our sample had RTs outside the RT window, compared with fewer than 3% of stimulus responses. As an example, Figure 2 shows the RTs and the calculated RT window of a patient with a high false-positive rate. 
The minimum visual reaction time has been estimated to be approximately 180 msec. 19 Responses occurring less than 180 msec after stimulus onset are therefore false-positives. The lower limit of the RT window as set at the criterion value of 0.5 was between 267 and 471 msec in 95% of tests (median: 344 msec). Wall et al. 20 investigated perimetric reaction times in normal and glaucomatous observers. Reaction time decreased exponentially with increasing suprathreshold increments, similar to the relationship first reported by Pieron in 1914. 9 In suprathreshold perimetry, stimuli are presented at a fixed suprathreshold increment (5 dB in this study) and are presumed to be of similar visibility across the visual field (consistent with the lack of evidence for a relationship between RT and stimulus eccentricity in our data). Responses with untypically long latencies are not necessarily false-positives. They may be manifestations of the occasional lapse of attention or may be due to threshold elevation if the visual field location is defective. Because latency analysis is used to reject suspect responses and to selectively re-examine the respective locations, such misclassifications occur at the expense of a minor increase in the number of stimulus presentations. 
The derivation of the RT window could be performed toward the end of a suprathreshold visual field examination, when a sufficient number of responses (probably in excess of 50) have been obtained to allow robust estimation of the patient’s RT distribution. Locations with suspect responses can subsequently be re-examined without introducing a break in the examination. We do not suggest that RT analysis should replace false-positive catch trials. In suprathreshold perimetry, relatively high rates of false-positive catch trials may be required to keep stimulus expectation at moderate levels. 
Conclusion
The results of this study show that a substantial proportion of false-positive responses in suprathreshold perimetry can be detected by latency analysis. Removal of suspect false-positive responses may reduce perimetric variability and bias in a small but important proportion of patients. Research into behavioral aspects of automated perimetry (influence of stimulus rates, feedback, and warning signals) may open new avenues to increase the efficiency of these tests. 
 
Figure 1.
 
Distribution of median RTs from suprathreshold tests of the right eye.
Figure 1.
 
Distribution of median RTs from suprathreshold tests of the right eye.
Figure 2.
 
Example of RTs and calculated RT window (V crit = 0.5) in a patient with a high false-positive rate. There were 68 stimulus presentations (66 responses) and 9 false-positive catch trials (5 false-positive responses). The remaining intervals were false-negative catch trials (to all of which the patient responded correctly; data not shown). Some of the responses given during stimulus intervals were also likely to be false-positives; 13 stimulus responses were outside the calculated RT window. The true false-positive response rate is unknown.
Figure 2.
 
Example of RTs and calculated RT window (V crit = 0.5) in a patient with a high false-positive rate. There were 68 stimulus presentations (66 responses) and 9 false-positive catch trials (5 false-positive responses). The remaining intervals were false-negative catch trials (to all of which the patient responded correctly; data not shown). Some of the responses given during stimulus intervals were also likely to be false-positives; 13 stimulus responses were outside the calculated RT window. The true false-positive response rate is unknown.
Figure 3.
 
Distribution of transformed latencies from responses to false-positive catch trials and suprathreshold stimuli.
Figure 3.
 
Distribution of transformed latencies from responses to false-positive catch trials and suprathreshold stimuli.
Figure 4.
 
Proportion of false-positive responses versus proportion of stimulus responses outside the RT window for different criterion values (V crit). Ordinate values are not necessarily misclassifications, because some responses occurring after stimulus presentations may be false-positives. Performance of z-based classification shown for comparison.
Figure 4.
 
Proportion of false-positive responses versus proportion of stimulus responses outside the RT window for different criterion values (V crit). Ordinate values are not necessarily misclassifications, because some responses occurring after stimulus presentations may be false-positives. Performance of z-based classification shown for comparison.
Sponsel WE, Ritch R, Stamper R, et al. Prevent Blindness America visual field screening study. The Prevent Blindness America Glaucoma Advisory Committee. Am J Ophthalmol. 1995;20:699–708.
Klein BE, Klein R, Sponsel WE, et al. Prevalence of glaucoma: the Beaver Dam Eye Study. Ophthalmology. 1992;99:1499–1504. [CrossRef] [PubMed]
Katz J, Tielsch JM, Quigley HA, Javitt J, Witt K, Sommer A. Automated suprathreshold screening for glaucoma: the Baltimore Eye Survey. Invest Ophthalmol Vis Sci. 1993;34:3271–3277. [PubMed]
Siatkowski RM, Lam BL, Anderson DR, Feuer WJ, Halikman AM. Automated suprathreshold static perimetry screening for detecting neuro-ophthalmologic disease. Ophthalmology. 1996;103:907–917. [CrossRef] [PubMed]
Vingrys AJ, Demirel S. False-response monitoring during automated perimetry. Optom Vis Sci. 1998;75:513–517. [CrossRef] [PubMed]
Henson DB, Chaudry S, Artes PH, Faragher EB, Ansons A. Response variability in the visual field: comparison of optic neuritis, glaucoma, ocular hypertension, and normal eyes. Invest Ophthalmol Vis Sci. 2000;41:417–421. [PubMed]
Bengtsson B. Reliability of computerized perimetric threshold tests as assessed by reliability indices and threshold reproducibility in patients with suspect and manifest glaucoma. Acta Ophthalmol Scand. 2000;78:519–522. [CrossRef] [PubMed]
Olsson J, Bengtsson B, Heijl A, Rootzen H. An improved method to estimate frequency of false positive answers in computerized perimetry. Acta Ophthalmol Scand. 1997;75:181–183. [PubMed]
Luce RD. Response Times: Their Role in Inferring Elementary Mental Organization. 1986; Oxford University Press New York.
Bengtsson B, Olsson J, Heijl A, Rootzen H. A new generation of algorithms for computerized threshold perimetry, SITA. Acta Ophthalmol Scand. 1997;75:368–375. [PubMed]
Humphrey Field Analyzer Owner’s Manual. San Leandro, CA: Humphrey Instruments; 1992.
Henson DB, Artes PH. A new algorithm for setting the test intensity in suprathreshold perimetry [ARVO Abstract]. Invest Ophthalmol Vis Sci. 2000;41(4)S294.Abstract nr 1550
Lee M, Zulauf M, Caprioli J. The influence of patient reliability on visual field outcome. Am J Ophthalmol. 1994;117:756–761. [CrossRef] [PubMed]
Kutzko KE, Brito CF, Wall M. Effect of instructions on conventional automated perimetry. Invest Ophthalmol Vis Sci. 2000;41:2006–2013. [PubMed]
Artes PH, Henson DB, Chaudry SJ. Pointwise pass/fail criteria in suprathreshold perimetry. Wall M Mills RP eds. Perimetry Update 2000/2001. 2001;283–291. Kugler Publications The Hague, Netherlands.
Flammer J, Gloor B, Glowazki A, Krieglstein GK. Automatische Perimetrie. 1987; Ferdinand Enke Verlag Stuttgart.
Deese J. Some problems in the theory of vigilance. Psychol Rev. 1955;62:359–368. [CrossRef] [PubMed]
Barnett V, Lewis T. Outliers in Statistical Data. 1994; 2nd ed. Wiley Chichester, UK.
Woodworth RS, Schlosberg H. Experimental Psychology. 1954; Holt New York.
Wall M, Maw RJ, Stanek KE, Chauhan BC. The psychometric function and reaction times of automated perimetry in normal and abnormal areas of the visual field in patients with glaucoma. Invest Ophthalmol Vis Sci. 1996;37:878–885. [PubMed]
Figure 1.
 
Distribution of median RTs from suprathreshold tests of the right eye.
Figure 1.
 
Distribution of median RTs from suprathreshold tests of the right eye.
Figure 2.
 
Example of RTs and calculated RT window (V crit = 0.5) in a patient with a high false-positive rate. There were 68 stimulus presentations (66 responses) and 9 false-positive catch trials (5 false-positive responses). The remaining intervals were false-negative catch trials (to all of which the patient responded correctly; data not shown). Some of the responses given during stimulus intervals were also likely to be false-positives; 13 stimulus responses were outside the calculated RT window. The true false-positive response rate is unknown.
Figure 2.
 
Example of RTs and calculated RT window (V crit = 0.5) in a patient with a high false-positive rate. There were 68 stimulus presentations (66 responses) and 9 false-positive catch trials (5 false-positive responses). The remaining intervals were false-negative catch trials (to all of which the patient responded correctly; data not shown). Some of the responses given during stimulus intervals were also likely to be false-positives; 13 stimulus responses were outside the calculated RT window. The true false-positive response rate is unknown.
Figure 3.
 
Distribution of transformed latencies from responses to false-positive catch trials and suprathreshold stimuli.
Figure 3.
 
Distribution of transformed latencies from responses to false-positive catch trials and suprathreshold stimuli.
Figure 4.
 
Proportion of false-positive responses versus proportion of stimulus responses outside the RT window for different criterion values (V crit). Ordinate values are not necessarily misclassifications, because some responses occurring after stimulus presentations may be false-positives. Performance of z-based classification shown for comparison.
Figure 4.
 
Proportion of false-positive responses versus proportion of stimulus responses outside the RT window for different criterion values (V crit). Ordinate values are not necessarily misclassifications, because some responses occurring after stimulus presentations may be false-positives. Performance of z-based classification shown for comparison.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×