The results of this study demonstrate that the diagnostic performance of mfVEP was similar to that of SAP. The sensitivity of these two functional tests to detect GON was equal when matched for specificity, regardless of whether the diagnostic standard was based on masked experts’ grading of a stereo disc photo, the HRT MRA, or the combination of both structural assessments. These findings are consistent with those of Graham et al.
14 who used disc photographs as an independent diagnostic standard. In our study, the sensitivity to detect GON was higher for both functional tests (42%) when the HRT MRA was used as the diagnostic standard, compared with when the disc photograph grade was used as the diagnostic standard (29%). This result is explained by the fact that the HRT MRA was generally more conservative than the masked expert graders, classifying exactly half as many eyes as having GON. The specificity of both functional tests remained relatively high (85%, i.e., the false alarm rate did not increase when the HRT MRA was used in place of disc photograph grades), indicating that nearly all the eyes in which the two structural classifiers disagreed had normal results on both functional tests. This suggests that there was better agreement between both functional tests and the HRT MRA than with the stereo disc photograph grades.
Agreement between the two functional tests was also generally good (∼80%). This is consistent with the findings of Bjerre et al.
12 who reported “fair” agreement between SAP and mfVEP in subjects diagnosed with (manifest) glaucoma. Perhaps the more interesting portion of the population is the 20% of eyes in which the two functional tests did not agree, notwithstanding that the criteria used to define abnormalities, as well as the normative databases differed. Hood et al.
11 13 20 predicted that the performance of a monocular mfVEP test would be approximately equal to that of SAP, but slightly better than SAP when the interocular analysis was added to the mfVEP. We used an “either-or” combination of monocular and interocular tests to define an abnormality for the mfVEP, but restricted the definition of an abnormal SAP to only a monocular analysis. The proportion of eyes with an abnormal SAP, as well as the agreement with mfVEP might have been even greater if point-wise interocular analyses had also been used for SAP.
28 Further studies are needed to establish normative ranges and evaluate potential specificity tradeoffs for point-wise interocular SAP threshold asymmetry analyses.
The fact that SAP and mfVEP performed equally well, but agreed in only ∼80% of cases, suggests that the mfVEP detects some real abnormalities that SAP is missing and vice versa. There are also differences between the spatial patterns and contiguity of test points. The current body of evidence (see e.g., work of Hood et al.
11 13 20 ) indicates that in cases where the mfVEP SNR is high, it is more likely than SAP to detect functional abnormalities using the point-wise interocular comparison, whereas in cases where the mfVEP SNR is low, the interocular comparison becomes more variable and the mfVEP advantage is lost (i.e., SAP will be more sensitive for a given specificity). The mfVEP is more likely than SAP to miss defects in the superior periphery
13 because SNR is generally lower at those stimulus locations,
24 whereas SAP is more likely to miss localized central defects
13 because of its lower spatial resolution at those locations. It is also likely that some portion of the disagreement is attributable to variability (noise). For example, in a population with similar characteristics, >85% of initial defects on SAP were not confirmed on retest.
29 Recent work suggests that mfVEP abnormalities (clusters) rarely repeat in healthy control eyes, especially in the same location.
25 Thus, confirmation of a mfVEP cluster abnormality in an early glaucoma or suspect eye is likely to represent a real (reliable) functional defect. Longitudinal evaluation will help determine which of the functional abnormalities in these high-risk patients with suspected or early glaucoma are repeatable and associated with progressive GON.
Agreement between the two
structural classifiers was lower, 69% or 76%, depending on whether borderline MRA cases were assigned to the normal or GON category. We chose to assign the borderline MRA cases to the normal category, to maintain higher specificity.
27 In fact, the number of eyes classified as GON by the HRT MRA increased by 60% when the borderline cases were switched to the GON category. This finding underscores the fact that many of the subjects were referred in to the study as high risk for suspected glaucoma because they had a suspicious (or borderline) optic disc appearance, yet had a normal or nearly normal visual field in one or both eyes and thus may represent a bias in our study population that would lower the apparent sensitivity of both functional tests. Previous studies have shown that the proportion of eyes with functional abnormalities (and the positive predictive value of such tests for GON) is relatively low, even in eyes with cup-to-disc ratios ≥0.8.
30 31 It should also be noted that as the definition of high-risk suspect eyes continues to evolve, it might differ from that used during the recruitment phase of this study, although this should not have a major impact on the results of this cross-sectional comparison of SAP and mfVEP.
In summary, the sensitivity and specificity of mfVEP and SAP were similar when the diagnostic standard was based on structural characteristics of the optic disc. Agreement between the two functional tests was ∼80%, which was higher than the agreement between the two measures of optic disc structure. Although the specificity of both functional tests was reasonably high (∼85%), sensitivity to detect GON was relatively low (∼30%–45%). The low sensitivity may be partially attributable to the selection bias of the study population, but also suggests that structural and functional abnormalities are not highly coincident during early-stage glaucoma.
32 This notion is supported by the results from OHTS in which approximately 60% of all conversions occurred because of structural progression (optic disc change), 40% because of changes on SAP, but <15% by changes in both structure and function.
21 Thus, the absolute values of sensitivity and specificity derived from the present study should be interpreted with caution, as they depend greatly on the diagnostic standard applied, the composition of the study population, and the nature of structure–function relationships in early glaucoma.
The authors thank Cindy Blachly, Thie Smith, Judy Thompson, and Karin Novitsky for their care and diligence during data collection.