Abstract
Purpose :
The luminance thresholds measured by standard automated perimetry (SAP) lack precision, especially at locations where visual sensitivity is low. What causes this imprecision? In clinical settings, test-takers show large inter-individual differences in precision due to human factors such as fixation stability, false negatives, false positives, cognitive ability, and stamina. However, clinical tests are necessarily of short duration. What is the relative impact of human factors, as compared to test brevity, as a source of imprecision? We calculated test-retest variability for an ideal observer as a function of staircase length to assess claims that new, low-fatigue tests that collect more data at home (Chia,...Ou,, 2023 Ophthalmology Glaucoma) can overcome human factors.
Methods :
Staircase procedures similar to full threshold and SITA were run in Python to simulate the performance of an ideal observer (for whom the human factors did not contribute to variance). 1000 replications were made for staircases of different lengths, across variation of four additional parameters: slope of the psychometric function (frequency-of-seeing curve), false alarm rate, false positive rate, and ratio of up:down step size in the staircase itself. Staircase data were fitted with cumulative normal ogives and the threshold estimate for each staircase was the intensity at which 50% of stimuli were seen. Test-retest variability was estimated by the standard deviation (SD) across the resulting 1000 estimates of threshold.
Results :
Compared to a staircase with 4 trials (containing 1 or 2 reversals), a staircase with 10, 20, or 40 trials reduced the sample SD by a factor of 1.7, 4.4, or 6.7, respectively. For long staircases, SD showed a robust linear relationship to sigma (the slope parameter), but for short staircases this relationship was highly dependent on details of the staircase procedure used. False negative and false positive rates, which were fitted in the model, and had little effect on SD. At 10 and 40 trials, respectively, the 1:3 staircase produced a SD that was 4.4 or 1.4 times the SD of the 1:1 staircase.
Conclusions :
The primary cause of poor test-retest variability in current tests is the insufficient number of stimuli presented at each location. In most test-takers, at any given sensitivity, the contribution of human factors to imprecision can be overcome by extending the duration of the test.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.