To target different subpopulations of retinal ganglion cells, each test included in this study uses different stimuli, backgrounds, thresholding algorithms and normative databases. By equating the tests for set specificity levels, these differences between the tests can be minimized, thus allowing for fair comparisons between the results. We generated receiver operating characteristic (ROC) curves and derived abnormality cutoffs at set specificity levels of 90% and 95%. ROC curves were generated for the following visual field parameters: mean deviation (MD), pattern standard deviation (PSD), and the number of total deviation (TD) and pattern deviation (PD) points triggered at 5% and at 1%. This procedure was used for each of the six visual field tests used in this study (initial and confirmatory SAP, SWAP, and FDT tests). For each of the six tests, the cutoff associated with the desired specificity (90% and 95%) was applied to determine whether each result was normal or abnormal. The areas under the ROC curves were compared statistically with the method of DeLong et al.
29 using commercial software (Matlab; The MathWorks Inc., Natick, MA). After exploring the data, we opted to compare the tests based on the abnormality cutoffs obtained for the PSD for each test. Four reasons prompted this decision: (1) the ROC-derived PSD was the best parameter for two of the three tests (SAP and SWAP), (2) no significant differences were observed between the area under the ROC curves of the best parameter of each test (that which yielded the highest area under the ROC curve) and the ROC-derived PSD for any of the tests (
P > 0.05), (3) PSD is a continuous variable, allowing for specificities to be equated more accurately, and (4) PSD performs better at distinguishing between normal and glaucoma subjects than MD, although MD may be better for determining progression.
17 We are reporting the results based on 95% specificity level (similar results were obtained at 90% specificity), as high specificities are desirable for glaucoma. This allows a more direct comparison with the machine-derived PSDs which are based on the 95% specificity level derived from their respective internal normative databases. We compared the sensitivities across tests at set specificities using the McNemar test.