In the current study, a novel method was developed to evaluate GT results quantitatively and objectively in VF tests. The relationship between GT results and test–retest reproducibility of 24-2 and 10-2 VFs was then investigated. The TFF GT index appeared to be particularly important for VF reproducibility in 24-2 and 10-2 VFs. In contrast, FNs were a significant predictor of VF variability in 24-2 VFs, but not in 10-2 VFs. Age was not related to VF reproducibility, in agreement with a previous report.
17
In the current study, PSD, which measures the amount of unevenness of the VF, was selected as an important predictor of VF variability in both 24-2 and 10-2 VF models, while MD was not selected. This is probably because the majority of patients included in the study were in a mild to moderate stage of the disease. Early glaucomatous damage is often reflected more sensitively using the PSD index rather than the MD statistic; this is because early focal VF change can be masked by the averaging carried out in the calculation of MD.
18 Previous studies have reported that VF reproducibility is relatively good in early glaucoma and worsens with the progression of the disease, then becomes good again when glaucoma reaches an advanced stage because of the ‘floor effect' of VF sensitivity.
19 On the other hand, PSD is high in moderate glaucoma and lower in early and advanced stages, which may explain why PSD was selected in the best models, instead of MD.
In the best model for variability in 24-2 VFs, only TFF was selected among all GT parameters; this may be because, in 24-2 VFs, test points are located in 6° intervals, and hence the influence of eye movements within 6° may have a limited effect. Indeed, move
3-5 was selected as a significant predictor in the best model for 10-2 VFs while move
1-2 was not selected, which is noteworthy when we consider that test points are located in 2° intervals in the 10-2 VF test pattern. Nonetheless, move
≥6 was not selected in the best model for either 24-2 or 10-2 VFs. This may be because the eye tracking system is unable to track eye movements at this resolution and so TFF is a more useful predictor. Furthermore, as shown in
Table 3, patients' eye movements may not often exceed 6°, in which case the influence of this predictor in the model will be diminished.
None of the traditional reliability indices were selected as significant predictors of variability in the best models for either 24-2 or 10-2 VFs, except for FNs in the 24-2 VF model. It has been reported that FNs increase with the progression of glaucoma, which itself is associated with lower reproducibility.
2 On the other hand, Bengtsson and colleagues
20 investigated the relationship between reproducibility and FLs, FPs and FNs, and found that only FNs were significantly associated with reproducibility. The results in the current study suggest that FNs are related to the reproducibility of VF sensitivity in addition to the disease status, as represented by PSD. It is worth noting that FPs are calculated differently in the SITA algorithm than they are in the Full-Threshold test in which classic catch trials are employed. In the SITA algorithm any response prior to the minimum response time (~180 ms), adjusted according to the patient's individual mean response time, is considered a FP error.
1 This may suggest that all actual FP responses after the minimum response time are ignored in the FP calculation. On the contrary, GT parameters reflect the status of eye position directly during the actual threshold measurements. In addition, there is a previous report which suggested the FPs with the SITA algorithm are underestimated compared with those in the Full-Threshold test, which uses the classic catch trials.
21 Also, as shown in
Table 4, the mean rates of FL, FP, FN were low compared with GT indices. This may have contributed to the small effect of traditional indices and the selection of GT indices in the best models.
Our results do not deny the perception that traditional indices are important factors when investigating the reliability of VF measurements. A possible interpretation is that FN and FP are good indices of accurate VF measurements through the prediction of over- or underestimation of VFs but not so much through the prediction of test–retest reproducibility. Gaze tracking parameters could have been more useful for the prediction of reproducibility.
One of the possible caveats of the current investigation is the limited range of glaucomatous disease observed in the study. Most patients were in an early to moderate stage of glaucoma, and so an assessment of the usefulness of the GT parameters should also be carried out in patients with advanced disease in a future study. One of the difficulties in performing this analysis is that a degradation of VFs is often accompanied by a deterioration of visual acuity, which can cause poor VF reproducibility, in addition to eye movements during the VF measurement. Nonetheless, reproducibility of VFs is equally important in this population, and hence further investigation is required. Furthermore, GT results should be investigated in a larger population, including healthy controls and in patients with other ocular disorders; thus, this research should be considered a pilot study.
In the current study, GT data were exported as JPEG images from the Beeline data filing system and various GT parameters were simply calculated by reading the JPEG image. Thus, GT parameters could be obtained on a personal computer; clinicians would then be able to estimate the reliability of patient's VF at a clinical setting.
In conclusion, we have developed a method to quantitatively investigate the GT record on HFA VF tests. Moreover, the GT parameters derived in this study are significant predictors of reproducibility in both 24-2 and 10-2 VF tests.