The objective of this research was to establish a conceptually simple technique to derive the overall statistical significance of visual field deterioration in individual patients. By comparing an observed series to many reordered arrangements of itself, PoPLR derives a continuous P value that is individualized to a particular patient's data: the false-positive rate is independent of factors such as variability, level of visual field damage, and length of follow-up. These properties distinguish PoPLR from many other techniques of determining change in the visual field.
Current techniques such as GCP and PLR do not provide a single
P value for a clearly defined null hypothesis. Rather, the results need to be interpreted with reference to large groups of subjects (population-based criteria). With GCP, for example, criteria for “likely progression” and “possible progression” have been established in the Early Manifest Glaucoma Trial
23 and have been shown to provide high specificity,
on average. However, when the properties of an individual patient's data differ from those of the reference group, population-based change criteria can give misleading results. For example, visual fields with advanced damage often contain many locations at which the dynamic range of the instrument has been exhausted (sensitivity < 0 dB) such that further deterioration is no longer measurable. A population-based progression criterion that demands change at a fixed number of visual field locations, for example, will thus be more conservative (less sensitive, more specific) in patients with more advanced damage. We have previously shown that the likelihood of experiencing a false-positive result with the GCP can vary by as much as 40 times between patients (Artes PH, et al.
IOVS 2011;52:ARVO E-Abstract 4148). With PLR, the specificity of any given criterion varies, among other factors, with the number of examinations. In our data, for example, PLR with a criterion of “≥1 location with a slope < −1.0 dB/year, at a
P value < 0.01” gave a false-positive rate of 10.4% after five examinations, decreasing to 5.9% after eight examinations. Since the specificity of population-based criteria varies with the properties of the data, it is difficult to interpret the findings when such criteria are applied to individual patients.
In contrast to GCP and PLR, PoPLR tests the well-defined null hypothesis that none of the visual field locations show negative change, with reference only to the observed statistic
S obs and its permutation distribution in the individual patient's series of visual fields. Large variability within the series will cause the permutation distribution to be wider, and a given
S obs will therefore be associated with a larger
P value (lesser significance) than in a series with lower variability. In the highly variable visual field series of case 2 (
Fig. 5), for example, a
S obs of 32 was only borderline significant (
P = 0.08), while the
S obs of 14 in case 4 (
Fig. 7) was associated with a
P value of 0.04. The nearly 5-fold variation in the width of the permutation distributions illustrates the importance of judging significance based on the properties of the individual visual field series, rather than by population-based cutoff values. Because PoPLR provides a continuous
P value rather than just a categorical classification (change/no change, as with population-based criteria), it will support more differentiated judgments in borderline cases in which the
P value is close to a particular significance level (e.g., 0.05).
When PoPLR was applied to randomly reordered series, the
P values followed the expected uniform distribution. This means that the false-positive rate of the approach equals the nominal significance level; for example, the probability of falsely detecting visual field deterioration in a stable series, at a significance level of
P < 0.05, would be 5%. Our results also demonstrate that PoPLR performs at least as well as conventional PLR criteria in distinguishing between observed and randomly reordered series. With five examinations, for example, a PLR criterion of “≥1 location with slope < −1 dB/year at
P < 0.01” detected deterioration in 17% of the series, at a specificity of 90%. At the same specificity (i.e., with a
P value of 0.10), PoPLR detected deterioration in 23% of the observed visual field series (
Fig. 2). Both PoPLR and PLR had substantially higher hit rates, at matched specificities, compared to the
P value associated with the MD rate of change, underscoring the greater utility of localized analyses of visual field deterioration over global indices.
An assumption made in PoPLR is that the order of the examinations is irrelevant unless a real change has taken place, such that the permutation distribution is a valid approximation of the null distribution. However, no assumptions need to be made about the distribution of measurement errors. PoPLR uses simple linear regression to determine the statistical significance of negative trends at individual locations, but in principle the technique can be adapted to other methods of determining change. Since the overall significance is determined from the permutation distribution, such adaptations will affect only the sensitivity but not the specificity of the approach. It is possible that visual field measurements closer in time to each other are more related than those further apart (temporal autocorrelation) even if no glaucomatous deterioration occurs. In this study, any effects of this phenomenon were reduced by including only one examination from a set of repeat examinations or terminating series when there were long intervals between examinations. A “blocking” strategy may be a more general solution, whereby the sub-sequence of close measurements are blocked (not reordered) in order to take into account their closer relationship. However, autocorrelation properties of visual fields are, as yet, largely unknown, and therefore it is difficult to investigate their effect on approaches such as PoPLR.
Ideally, methods for determining visual field progression would be evaluated using the classic metrics for diagnostic tests, sensitivity and specificity. However, given the lack of a reference standard for what constitutes true change in the visual field as opposed to random variability, sensitivity and specificity are difficult to determine in real clinical data. In this research, we used a different approach for comparing the performance of PoPLR with previously established PLR criteria. In place of sensitivity, we used the positive rate in the observed visual field data (“hit rate”). Similarly, the positive rate in randomly permuted visual field series was used as a surrogate measure of specificity. This approach is based on the rationale that random reordering will remove any systematic trend present in the observed series, and that a more powerful method will detect change in a larger proportion of visual field series, at the same specificity. In contrast to computer simulations, which by necessity rely on simplifying models of visual field progression and stability, our approach used real clinical data, which are more likely to reflect the complex spatiotemporal properties of visual fields.
Permutation approaches offer powerful and adaptable solutions for assessing change in complex data. They have been widely used in neuroimaging
24,25 and have been applied to assess changes in the optic disc in glaucoma.
26,27 While they make fewer assumptions than model-based approximations, they require greater computational effort that has only recently become feasible.
In summary, PoPLR provides a statistical significance for visual field deterioration that is tailored to the individual patient's data. This will make it useful for determining end points in clinical trials and for interpreting change in clinical practice. Because the specificity of PoPLR is, by design, independent of the properties of the underlying data (variability, length of follow-up, number of locations, dB scale), it may also have applications in comparing the evidence of visual field progression between different follow-up protocols
28 and different types of visual field tests.
29 - 31