purpose. To describe an approach for the evaluation of covariate effects on receiver operating characteristic (ROC) curves and to apply this methodology to the investigation of the effects of disease severity and age on the diagnostic performance of frequency doubling technology (FDT) and standard automated perimetry (SAP) visual function tests for glaucoma detection.

methods. The study included 370 eyes of 211 participants, with 174 eyes of 110 patients having glaucomatous optic neuropathy and 196 eyes of 101 subjects being normal. All patients underwent visual function testing with FDT 24-2 Humphrey Matrix and SAP SITA (Carl Zeiss Meditec, Inc., Dublin, CA). Disease severity was evaluated by the amount of neuroretinal rim loss assessed by confocal scanning laser ophthalmoscopy. An ROC regression model was fitted to evaluate the influence of disease severity and age on the diagnostic performance of the pattern SD (PSD) index from FDT 24-2 and SAP SITA.

results. After adjustment for age, the areas under the ROC curves (AUCs) for SAP SITA PSD for 10%, 30%, 50%, and 70% loss of neuroretinal rim area were 0.638, 0.756, 0.852, and 0.920, respectively. Corresponding values for FDT 24-2 PSD were 0.766, 0.857, 0.922, and 0.962. For 10% and 30% rim loss, FDT 24-2 PSD had a significantly larger AUC than did SAP SITA PSD.

conclusions. A regression methodology to evaluate covariate effects on ROC curves can be useful for assessment of diagnostic tests in glaucoma. Using the proposed methodology, a significantly better performance of FDT 24-2 compared to SAP SITA for diagnosis of early glaucoma was demonstrated.

^{ 1 }

^{ 2 }Based on the notion of using a threshold to classify subjects as positive (diseased) or negative (nondiseased), an ROC curve is a plot of the true-positive rate (TPR) versus the false-positive rate (FPR) for all possible cutpoints. Thus, it describes the whole range of possible operating characteristics for the test and hence its inherent capacity for distinguishing between subjects with and those without glaucoma. In most studies, however, ROC curves for diagnostic tests have been reported without taking into account the possible effects of covariates on test results. For example, in most studies of diagnostic tests in glaucoma, a single ROC curve is reported to represent the performance of the test in all included patients. Patients in these studies, however, frequently have different degrees of disease severity or different values of other covariates, such as age. Although the single “pooled” ROC gives the average performance of the test in the population, we are frequently interested in knowing how the test performs in subgroups of patients—for example, patients with early disease or with a specific value on another covariate.

^{ 3 }

^{ 4 }Further, it is possible that the comparison of the diagnostic abilities of different tests are influenced by the severity of glaucomatous damage. For example, it is possible that a particular test is more sensitive at early stages of the disease, whereas another test may be more sensitive at moderate or advanced stages. Therefore, it is important to characterize the relationship between the performance of the diagnostic test and the severity of disease and to evaluate how this relationship affects the comparison between different tests. Regression methods have been proposed to analyze covariate effects on the ROC curves.

^{ 5 }

^{ 6 }These methods allow the evaluation of the influence of covariates such as disease severity on the diagnostic performance of the test, so that ROC curves for specific values of the covariates can be obtained. Another advantage of this method is that it allows comparison of ROC curves for different tests after adjusting for the effects of covariates, so that the tests can be better compared.

^{ 7 }

^{ 8 }

^{ 9 }

^{ 10 }

^{ 11 }This test has recently undergone major modifications in its procedures, with the development of the 24-2 test pattern, but the ability of the FDT 24-2 test to diagnose different levels of glaucomatous damage in patients has not yet been determined.

^{ 12 }

^{ 13 }

^{ 14 }The proposed methodology allowed the comparison of the diagnostic performance of these two visual function tests adjusting for the severity of disease and age, so that their accuracies could be compared at specific levels of these covariates.

^{ 15 }Visual field results were not used to classify patients. The photographs were evaluated by two experienced graders, and each was masked to the subject’s identity and to the other test results. For inclusion, a photograph had to be deemed of adequate quality or better. Glaucomatous optic neuropathy (GON) was defined as the presence of neuroretinal rim thinning, excavation, notching, or characteristic retinal nerve fiber layer defects. Discrepancies between the two graders were either resolved by consensus or by adjudication of a third experienced grader.

^{ 16 }Visual function tests and optic disc imaging with HRT II were all obtained within an interval not greater than 6 months.

^{ 17 }It utilizes a small (0.47°) 200-ms flash of white light as the target presented on a dim background (10.5 cd/m

^{2}). FDT perimetry (FDT 24-2) was performed with the commercially available Humphrey Matrix perimeter (Carl-Zeiss Meditec, Inc.). The Humphrey Matrix presents 5° stimuli, with a spatial frequency of 0.5 cyc/deg and temporal frequency of 18 Hz, on a background with a luminance of 100 cd/m

^{2}. Stimuli are presented for 500 ms, including ramped onsets and offsets of 100 ms. The principles and psychometric properties of the ZEST strategy used for threshold estimation have been described in detail elsewhere.

^{ 18 }

^{ 19 }The test locations of the FDT 24-2 program are similar to those of the SAP SITA 24-2 test. For both SAP SITA and FDT 24-2, 54 locations were tested within the central 24° of visual field. The two locations just above and below the blind spot were not included in the analysis.

*T*

_{i}is the measured age-adjusted threshold at point i,

*N*

_{i}is the normal age-adjusted reference threshold (obtained from our control subjects) at point i,

*S*

_{i}

^{2}is the variance of normal field measurements at point i,

*n*is the number of points in the test (

*n*= 52 for both FDT 24-2 and SITA standard), and MD is the mean deviation. PSD was selected based on its comparable or superior performance compared with other indexes in previous studies involving SAP and FDT visual function tests.

^{ 20 }

^{ 21 }

*x*and

*y*directions at multiple focal planes. According to confocal scanning principles, a three-dimensional topographic image is constructed from a series of optical image sections at consecutive focal planes.

^{ 22 }

^{ 23 }

^{ 24 }The topography image determined from the acquired three-dimensional image consists of 384 × 384 (147,456 total) pixels, each of which is a measurement of retinal height at its corresponding location. For each patient, three topographical images were obtained and were combined and automatically aligned to make a single mean topography used for analysis. Magnification errors were corrected using patients’ corneal curvature measurements. An experienced examiner outlined the optic disc margin on the mean topographic image while viewing stereoscopic photographs of the optic disc. Good-quality images required a focused reflectance image with a standard deviation not greater than 50 μm.

^{ 25 }

^{ 5 }

^{ 26 }and previously used to evaluate the influence of the degree of hearing loss on results of diagnostic tests in audiology, as well as in other applications.

^{ 26 }

^{ 27 }As this modeling approach has not been previously applied to evaluation of diagnostic tests in ophthalmology, we will describe it in some detail. Further detail can be found in several publications.

^{ 5 }

^{ 26 }

^{ 27 }

_{ X,X D }(

*q*) is the probability that a diseased individual with disease-specific covariates

*X*

_{D}and common covariates

*X*has test results

*Y*

_{D}that are greater than or equal to the

*q*th quantile of the distribution of tests results from nondiseased individuals. That is, when the specificity of the test is 1 −

*q*, the sensitivity is ROC

_{ X,X D }(

*q*). An example of disease-specific covariate is severity of the disease, as this covariate is obviously not defined for healthy subjects. In contrast, age is an example of a common covariate, as it is defined for subjects without and those with disease. The effects of

*X*and

*X*

_{D}can be modeled on ROC

_{ X,X D }(

*q*) by a generalized linear regression model (ROC-GLM model).

^{ 27 }

^{ 28 }The general ROC regression model can be represented by

*h*(·) which defines the location and shape of the curve. This approach is referred to as parametric distribution-free, as it specifies a parametric model for the ROC curve but does not assume distributions for the test results, which makes it advantageous compared with other modeling procedures.

^{ 5 }

^{ 28 }The functions

*g*(·) and

*h*(·) are chosen so that the ROC curve is monotone, increasing on the unit square. In most applications,

*g*(·) = Φ, the normal cumulative distribution function,

*h*

_{1}(

*q*) = 1 (with coefficient α

_{1}) and

*h*

_{2}(

*q*) = Φ

^{−1}(

*q*) (with coefficient α

_{2}) are generally used, which results in the binormal ROC model

_{1}and α

_{2}are the intercept and slope of the ROC curve, respectively. If the coefficient for a specific variable

*X*(β) is greater than zero, then the discrimination between those with disease and those without increases with increasing values of this covariate. Similarly, if the coefficient for the disease-specific covariate

*X*

_{D}(β

_{D}) is greater than zero, then diseased subjects with larger values of this covariate are more distinct from nondiseased subjects than are diseased subjects with smaller values of

*X*

_{D.}

*severity*is the variable indicating severity of glaucomatous damage as measured by percentage loss of rim area, and

*age*is a variable indicating patient’s age. Interaction terms between the variables and Φ

^{−1}(

*q*) were included to allow the effects of the covariates to differ by varying amounts depending on the

*FPRq*(or specificity 1 −

*q*), that is, to influence the shape of the curve. Interaction terms between FDT and severity and between FDT and age were included to assess whether the influence of disease severity and age was similar or different between FDT 24-2 and SAP SITA tests.

*n*= 500 resamples).

^{ 29 }As measurements from both eyes of the same subject are likely to correlate, the use of standard statistical methods for parameter estimation can lead to underestimation of standard errors and to confidence intervals that are too narrow.

^{ 30 }Therefore, to account for the fact that both eyes of some subjects were used for analyses, the cluster of data for the study subject was considered as the unit of resampling when calculating standard errors. This procedure has been used in other studies to adjust for the presence of multiple correlated measurements from the same unit.

^{ 27 }

^{ 29 }

*P*< 0.001; Student’s

*t*-test). The average neuroretinal rim loss in glaucomatous eyes was 22%. The distribution of severity of disease according to percentage loss of rim area in glaucomatous eyes is shown on Figure 1 .

*FDT*in the regression model. The superior performance of FDT 24-2 was similar throughout the range of false-positive (i.e., 1 − specificity) values, as indicated by the nonsignificant coefficient associated with the interaction term

*FDT*× Φ

^{−1}(

*q*) (

*P*= 0.400). That is, the ROC curves for FDT 24-2 and SAP SITA had a similar shape and did not cross.

*P*< 0.001). As expected, both tests performed better in patients with more severe disease. The influence of the severity of disease was not significantly different between the two tests, as indicated by the nonsignificant value of the coefficient representing the interaction between severity and FDT (β

_{5};

*P*= 0.892). There was a tendency for disease severity to exert a relatively greater, but not statistically significant, effect on lower FPRs (i.e., higher specificities), as indicated by the negative coefficient for the term

*Severity*× Φ

^{−1}(

*q*) (

*P*= 0.497). Figure 2shows ROC curves for SAP SITA and FDT 24-2 for arbitrarily chosen levels of percentage of neuroretinal rim loss and for age at 65 years, as calculated from the regression model. ROC curve areas and probabilities for the comparison between tests are shown on Table 2 . For 10% and 30% neuroretinal rim loss, FDT 24-2 PSD had a significantly larger area under the ROC curve than did SAP SITA PSD. For 50% and 70% rim loss, although the area under the ROC curve for FDT 24-2 was larger than for SAP SITA PSD, the difference was not statistically significant.

*P*= 0.008). The influence of age was similar between the two visual function tests (

*P*= 0.289 for the interaction term

*age*×

*FDT*) and throughout the false positive (or 1 − specificity) range (

*P*= 0.603 for the interaction term

*age*× Φ

^{−1}(

*q*)).

^{ 31 }In fact, our findings demonstrated that SAP performed poorly for diagnosis of patients with early disease. For a 10% loss of rim area, the ROC curve area for SAP SITA was 0.638, with a sensitivity of only 21% for 95% specificity. With increasing disease severity, the performance of SAP SITA improved, with the area under the ROC curve being as high as 0.920 for patients with more advanced damage (70% loss of neuroretinal rim area). In a histologic study in human eyes, Kerrigan-Baumrind et al.

^{ 32 }showed that an average loss of 27.3% of retinal ganglion cells is necessary for the corrected PSD index of standard achromatic perimetry to fall bellow the 95% normal confidence limits. Of interest, using SAP SITA PSD at 95% specificity in our study, the average percentage loss of rim area of the patients with glaucoma identified as abnormal was 30%, very close to the number provided by Kerrigan-Baumrind et al.

^{ 8 }

^{ 10 }In another study,

^{ 10 }we observed 105 patients with suspected glaucoma and demonstrated that functional abnormalities on FDT tests (N-30 strategy) were predictive of future onset and location of SAP visual field loss by as many as 4 years. In the present study, for earlier stages of damage (10% and 30% loss of neuroretinal rim), FDT had significantly higher areas under the ROC curves than did SAP SITA. It should be noted that, although FDT performed better than SAP SITA, most of the patients with early glaucomatous damage were still not detected by this test. However, for patients with more advanced damage and, therefore, more easily detectable disease, the diagnostic performances of the two tests were similar, with no statistically significant difference between the areas under the ROC curves.

^{ 33 }The 24-2 pattern for FDT was developed based on the fact that the larger number of points would make this test more helpful for disease follow-up. Although our study suggests a benefit of FDT 24-2 in detecting early disease compared with SAP, its role for longitudinal assessment of visual field progression still has to be evaluated.

^{ 34 }Stroux et al.

^{ 35 }evaluated the influence of disease severity on the sensitivities of several different visual function and electrophysiological tests. The logistic model developed by Leisenring et al.,

^{ 34 }however, was originally proposed for evaluation of tests with categorical results. Therefore, the evaluation of tests with continuous results using this approach requires that the tests results be dichotomized according to arbitrary cutoffs of specificity or sensitivity. The method used in the present study is advantageous, as the effects of covariates can be assessed on the whole ROC curve and therefore do not require dichotomization of test results.

^{ 20 }

^{ 21 }The recent Ocular Hypertension Treatment Study also showed that PSD, but not MD, is a predictor of glaucoma development among ocular hypertensive subjects, suggesting that this parameter may be important for identification of early glaucoma cases.

^{ 36 }

^{ 37 }However, other studies have suggested the possibility that a generalized depression of sensitivity may be a prominent feature of early glaucoma cases and the visual field index MD would be more likely to capture this abnormality than PSD.

^{ 38 }

^{ 39 }To investigate this, we tested whether the use of the MD index instead of PSD in the ROC regression models would improve detection of glaucoma. Corresponding values of ROC curve areas for 10%, 30%, 50%, and 70% percentage of neuroretinal rim loss were 0.706, 0.806, 0.858, and 0.901, respectively, for SAP SITA MD and 0.727, 0.813, 0.881, and 0.931 for FDT 24-2 MD. It is interesting to note that, although ROC curve areas for SAP SITA MD and FDT 24-2 MD were lower than those for FDT 24-2 PSD, SAP SITA MD actually performed better than SAP SITA PSD for detection of early glaucoma, in agreement with the previous observations of diffuse sensitivity loss in early glaucoma, when evaluated by SAP.

^{ 40 }This effect has been demonstrated recently on SAP visual fields of patients with values of MD worse than −24 dB (i.e., end-stage disease).

^{ 40 }Although this could have affected the evaluation of the influence of disease severity on our study, the patients included in the analysis had a maximum percentage of neuroretinal rim loss of approximately 70%, and only four eyes had values of SAP SITA MD worse than −20 dB, indicating that patients with end-stage disease were not a major component in the study.

^{ 16 }such evidence is not yet available for humans. Another limitation of our study is that the diagnosis of GON was based on cross-sectional assessment of stereophotographs. Ideally, for a more definitive diagnosis, progressive change of optic disc appearance would have to be demonstrated.

^{ 41 }Unfortunately, such longitudinal information is not yet available for all our patients. Future studies using progressive GON as the reference standard should be able to assess the performance of these tests under this circumstance.

^{ 26 }It should be noted that ROC curve areas also have limited intrinsic clinical meaning. Other indexes, such as likelihood ratios, may have more straightforward clinical interpretation and application. We have recently demonstrated the usefulness of likelihood ratios for interpretation of results of imaging tests in glaucoma.

^{ 24 }However, statistical methods for evaluation of covariate effects on likelihood ratios have not been well described in the literature and deserve further research.

**F.A. Medeiros**, Carl Zeiss Meditec, Inc. (F);

**P.A. Sample**, Carl Zeiss Meditec, Inc., Alcon Laboratories, Inc., and Allergan, Inc., Pfizer, Inc. (F);

**L.M. Zangwill**, Heidelberg Engineering (F, R);

**J.M. Liebmann**, None;

**C.A. Girkin**, None;

**R.N. Weinreb**, Carl Zeiss Meditec, Inc. and Heidelberg Engineering (F)

**Figure 1.**

**Figure 1.**

Parameter | Coefficient | Estimate | 95% CI | P |
---|---|---|---|---|

Intercept | α_{1} | 0.691 | (0.457–0.957) | <0.001 |

Φ^{−1} (q) | α_{2} | 0.739 | (0.543–0.923) | <0.001 |

FDT | β_{1} | 0.531 | (0.223–0.921) | 0.002 |

FDT × Φ^{−1} (q) | β_{2} | 0.117 | (−0.154–0.414) | 0.400 |

Severity | β_{3} | 2.070 | (1.259–3.170) | <0.001 |

Severity × Φ^{−1} (q) | β_{4} | −0.168 | (−0.649–0.325) | 0.497 |

Severity × FDT | β_{5} | 0.072 | (−1.035–1.135) | 0.892 |

Age | β_{6} | 0.021 | (0.008–0.041) | 0.008 |

Age × Φ^{−1} (q) | β_{7} | 0.003 | (−0.007–0.013) | 0.603 |

Age × FDT | β_{8} | 0.011 | (−0.012–0.030) | 0.289 |

**Figure 2.**

**Figure 2.**

**Figure 3.**

**Figure 3.**