The test–retest cohort contained series of at least seven reliable visual field tests from 133 eyes of 71 participants; and series of 10 reliable tests from 116 eyes of 63 participants. Fifty-nine percent were female; 69% were Caucasian, and 21% were Black. Fourteen had MD (on the last visit) greater than 0 dB; 74 had MD between 0 dB and −6 dB; and 45 had MD worse than −6 dB. The longitudinal cohort consisted of 505 eyes from 256 participants, once we had excluded eyes with fewer than five reliable tests. Fifty-eight percent were female; 95% were Caucasian. Most eyes had early visual field loss, with 207 eyes (41%) having MD on their most recent visit > 0 dB; 245 (49%) having MD between 0 and −6 dB; and 53 (10%) having MD worse than −6 dB.
Table 1 summarizes other characteristics of the two cohorts.
For MD, the criterion for “significant deterioration” to achieve 95% specificity in series of five tests was “MD worsening with rate < 0.0 dB/y and P < 0.101.” Twenty-five percent of eyes in the longitudinal cohort met this criterion within 5.0 years (95% CI, 4.7–5.3 years). The median time to meet this criterion was 8.6 years (95% CI, 7.4–10.8 years). When also requiring a rate of MD change worse than −0.1 dB/y, and adjusting the critical P value accordingly to maintain 95% specificity, the median time to detect significant deterioration increased to 9.1 years (95% CI, 7.6–12.7 years). With even stricter rate of change criteria, fewer than 50% of the series ever showed significant deterioration before the end of their series, despite the specificity (in series of length 5 tests) still being 95%.
Table 2 shows the lower quartile of the times to detect significant deterioration, using pointwise analyses, together with 95% CIs, for a selection of different numbers of locations and different rate of change criteria, each with 95% specificity for series of length 5 tests. The table shows the time for 25% of series to meet each of the criteria. The same pattern was apparent when using the median time, but for many criteria fewer than 50% of series showed significant deterioration before the end of their series. Again, imposing a minimum rate of change criterion and adjusting the
P value criterion to maintain 95% specificity delayed detection of significant deterioration in all cases. One commonly used criterion for pointwise linear regression is sensitivity deteriorating with rate worse than −1 dB/y and
P < 5%,
11 which is the same as total deviation deteriorating with rate worse than −0.9 dB/y and
P < 5%. As seen in
Table 2, requiring four such changing locations gave specificity 95%, but it took 11.9 years for 25% of longitudinal series to meet this criterion. Although several criteria were not significantly different from one another, the best pointwise criterion was “≥9 locations worsening with rate worse than 0 dB/y and
P < 0.138.” Using this criterion, 25% of series in the longitudinal cohort showed significant deterioration in 4.1 years (95% CI, 4.0–4.5); and 50% of series showed significant deterioration in 6.2 years (95% CI, 5.9–7.0).
Table 3 shows the lower quartile of the time to detect significant deterioration for a selection of cluster trend analysis criteria, again each with 95% specificity for series of length 5 tests. As for global and pointwise analyses, including a rate of change criterion delayed detection of significant deterioration for the same specificity. While a few different criteria were not significantly different from each other, the most rapid criterion to detect significant deterioration was “≥3 clusters worsening with rate worse than 0 dB/y and
P < 0.117.” Using this criterion, 25% of series showed significant deterioration in 4.8 years (95% CI, 4.2–5.1); and 50% in 7.4 years (95% CI, 6.8–8.3).
The top panel in
Figure 2 shows the survival curves for the best global, pointwise, and cluster trend analyses, that is, “MD worsening with
P < 0.101,” “≥9 locations worsening with
P < 0.138,” and “≥3 clusters worsening with
P < 0.117,” respectively. The cluster trend analysis detected significant deterioration significantly sooner than MD, with
P < 0.001; but significantly slower than pointwise analysis with
P < 0.001. Significant deterioration was detected within 5 years in 134 eyes, using MD; 148 eyes, using the best-performing cluster trend analysis; and 188 eyes, using the best-performing pointwise analysis. When only these first 5 years were considered, cluster trend analysis still detected change significantly sooner than MD (hazard ratio 0.859,
P = 0.012) and later than pointwise analysis (hazard ratio 1.268,
P < 0.001).
However, of the 293 eyes that had at least two subsequent visual fields after showing significant deterioration by the best-performing pointwise criterion, only 112 eyes (38%) still met the same criterion when including those two extra tests in the series. By contrast, while there were only 255 eyes that had at least two subsequent visual fields after showing significant deterioration by the best-performing cluster trend criterion, this change was confirmed when including those next two tests in 156 eyes (61%). Using MD, 232 eyes had at least two tests after significant deterioration was detected, and change was confirmed in 176 of those eyes (76%).
The second panel in
Figure 2 shows survival curves for the same global, pointwise, and cluster trend analysis criteria as above for the detection of “confirmed significant deterioration,” where eyes are only counted as deteriorating if they meet the same criterion after two more tests are added to the series (note that the date at which the eye first met the criterion is used for these survival curves, rather than the date that the deterioration was successfully confirmed). Twenty-five percent of eyes met this criterion, using MD after 6.3 years (95% CI, 6.0–7.2); using pointwise analyses after 6.3 years (95% CI, 6.0–7.0); and using cluster trend analyses after 6.0 years (95% CI, 5.3–6.6). The comparison between MD and cluster trend analysis had
P = 0.006 for the entire series, and
P = 0.882 when just the first 5 years were considered. The comparison between the pointwise and cluster trend analyses had
P = 0.186 for the entire series, and
P = 0.078 for the first 5 years.
The bottom panel in
Figure 2 shows the equivalent results when deterioration had to be confirmed after the addition of four subsequent visual fields. Twenty-five percent of eyes met this criterion, using MD after 7.3 years (95% CI, 6.4–8.6); using pointwise analyses after 7.4 years (95% CI, 6.8–8.6); and using cluster trend analyses after 7.3 years (95% CI, 6.4–8.4). The comparison between MD and cluster trend analysis had
P = 0.10 for the entire series, and
P = 0.27 when just the first 5 years were considered. The comparison between the pointwise and cluster trend analyses had
P = 0.18 for the entire series, and
P = 0.05 for the first 5 years.
We tested the hypothesis that the optimal criteria to detect change with 95% specificity in a test–retest cohort would depend on the series length. We repeated the analysis by using series of 7, and 10, visual fields in the test–retest cohort. As the series lengthens, the proportion of very rapid rates of change diminishes as the CI around the slope estimate narrows. This means that, for example, with a criterion of the form “three clusters each worsening at a rate worse than −1 dB/y with
P < Crit
nCl.x,” the value of Crit
nCl.x will increase with series length in order to achieve 5% false positives (95% specificity). However, for criteria of the form “three clusters each worsening at a rate worse than 0 dB/y with
P < Crit
nCl.x,” this reduction in the magnitude of the slopes has no effect, and so Crit
nCl.x will be comparatively independent of series length. Therefore since the best criteria for global, pointwise, and cluster analyses are all of the form “…rate worse than 0 dB/y…,” these criteria vary little with series length, as shown in
Table 4. The times for 25% of eyes to meet these criteria (with or without requiring subsequent confirmation) are very similar, especially for the cluster and global analyses, as seen in
Table 4. The times to detect change are shown in the survival curves in
Figure 3 (using test–retest series of length 7) and
Figure 4 (using test–retest series of length 10). We therefore conclude that it is reasonable to use a constant criterion regardless of series length, so long as the chosen criterion is based on statistical significance without imposing a non-zero minimum rate of change.
We then tested whether the relative performance of these optimal criteria differed by disease stage.
Figure 5 shows the time to meet the same criteria as before, based on series of length 5 in the test–retest cohort, but with the eyes split according to whether the MD at the start of the series was >0 dB (left column,
n = 326) or ≤0 dB (right column,
n = 179). The time until detectable change may be slightly longer when the initial MD was ≤0 dB, but that is likely because those eyes are being treated more aggressively and hence are less likely to progress rapidly. The main conclusions did not vary with disease stage; however, the benefits of the cluster trend technique were more apparent at the earliest stages of functional loss. Without requiring confirmation, in both cases cluster trend analysis detected change sooner than MD (
P = 0.044 for MD > 0 dB;
P = 0.001 for MD ≤ 0 dB) but slower than pointwise analysis (
P < 0.001 for both subsets). However when requiring that deterioration must be confirmed after two subsequent visual fields, there were no significant differences for initial MD ≤ 0 dB (
P = 0.71 for clusters versus MD;
P = 0.58 for clusters versus pointwise); while for initial MD > 0 dB, cluster trend analysis detected change significantly sooner than either MD (
P = 0.001) or pointwise analysis (
P = 0.005).
The current EyeSuite software that accompanies the Octopus perimeter determines whether the mean total deviation within each cluster is deteriorating with P < 5%. Requiring different numbers of clusters to meet this criterion, the closest to a specificity of 95% that could be achieved was “two clusters deteriorating with P < 5%,” which had a specificity of 94.4% in series of length 5 tests, 95.3% in series of 7 tests, and 95.4% in series of 10 tests. This criterion detected deterioration in 25% of eyes in 5.1 years (95% CI, 4.7–5.3), and in 50% of eyes in 8.1 years (95% CI, 7.3–9.5).