With reference to the computer simulation models, we showed that TA generally outperformed EA in attaining a high sensitivity (≥80%) to detect RNFL progression earlier at a comparable level of specificity (95% vs. 80%–100%, respectively). The simulation results were validated with 1680 longitudinal RNFL measurements obtained from 107 glaucoma and glaucoma suspect patients followed up for a median period of 38 months. At 36 months of follow-up, TA detected a greater proportion of eyes with progression (35%) than did EA (12%–28%) in most eyes with average test-retest variability (
Fig. 3A). TA may be a preferable strategy for following and detecting disease progression in glaucoma.
EA with progression, defined by a change greater than the reproducibility coefficient calculated from a reference database (i.e., EA with group RC), has been the prevailing approach to analyze glaucoma progression in clinical trials and in clinical practice. However, EA with group RC may fail to detect a small progressive change for patients with small test-retest variability and falsely signify progression for patients with large test-retest variability. Although EA with individual RC offers an individualized approach to detect progression, its performance relative to EA with group RC and TA has not been investigated. Computer simulation revealed that EA with individual RC indeed had a higher sensitivity than EA with group RC to detect change in eyes with small test-retest variability. However, in eyes with large test-retest variability, EA with group RC had higher sensitivity than EA with individual RC, albeit at a lower specificity (80% and 95%, respectively). These findings were confirmed with analysis on longitudinal clinical data (
Figs. 3B,
3C). The difference in performance between EA with individual RC and EA with group RC can be explained in mathematical terms, with sensitivity represented by
and
respectively (see Appendix for annotation) (
Supplementary Fig. S8). If the individual's test-retest variability is smaller than the group's test-retest variability, σ
g/σ would be >1, thereby shifting the classification cutoff of EA with group RC more negative and resulting in a lower sensitivity (and a higher specificity) compared with EA with individual RC. By contrast, if the individual's test-retest variability is greater than the group's test-retest variability, σ
g/σ would be <1, thereby shifting the classification cutoff of EA with group RC less negative and resulting in a higher sensitivity (and a lower specificity) compared with EA with individual RC. For most patients with average test-retest variability, the performance of EA with group RC would not be very much different from EA with individual RC (
Fig. 3A,
Supplementary Figs. S3E–H). Collectively, EA with individual RC is more informative than EA with group RC only when the individual test-retest variability is small. As expected, having a confirmation with a consecutive test increased the specificity but reduced the sensitivity for progression detection.
In the simulation models, TA often attained a high sensitivity earlier than EA with individual RC or group RC. However, EA with group RC was more sensitive than TA in the early follow-up period for patients with large test-retest variability. The superior performance was more remarkable when the rate of progression was fast (
Supplementary Figs. S4I–L). The relative sensitivity between TA and EA with group RC can be computed by comparing
and
(see Appendix). By fixing the specificity at Z
α, TA would be more sensitive when
This indicates that the selection between the 2 strategies depends on the relative difference between the group and individual test-retest variability (σ
g and σ) and the number of follow-up visits (n) (or follow-up duration). TA would be more sensitive than EA with group RC when the patient's test-retest variability is small and the follow-up duration is long. Although EA with group RC had a higher sensitivity than TA in the early follow-up period for subjects with large test-test variability, EA with group RC had lower specificity (80%). Accuracy (sensitivity × specificity) combines sensitivity and specificity and provides a more comprehensive analysis to evaluate the performance of different strategies. In fact, TA attained accuracy ≥80% earlier than all forms of EA independently of the test-retest variability, the pattern, and the rate of progression (
Supplementary Figs. S6,
S7).
An ideal strategy for the detection of progression should demonstrate high sensitivity at high specificity. It is not surprising to observe from the simulation that the specificity of TA and EA with individual RC was 95% because the alpha selected to define a significant change was fixed at 5%. Given that the specificity of EA with group RC was defined with alpha fixed at 5% with reference to the group RC, the specificity would be 95% for patients with average test-retest variability when the individual RC is similar to the group RC. Patients with small test-retest variability would have higher specificity (99%), whereas those with large test-retest variability would have a lower specificity (80%) (
Supplementary Fig. S5).
TA is superior not only in attaining high accuracy for detecting progression earlier than EA but also in providing a rate estimate that is useful to guide treatment and evaluate disease prognosis. In contrast to EA, however, multiple measurements are always required to compute a reliable slope estimate in TA. An important question is: how often should an imaging or a visual field test be scheduled? In other words, how many observations are needed to reliably derive the slope estimate? This question has been previously discussed with visual field testing, though little is known for structural assessment.
29,30 By specifying the level of sensitivity and the rate of change, it is possible to work out the impact of the number of observations per year on the minimum duration required to detect progression (
Fig. 4) (see Appendix). Assuming a rate of change of average RNFL thickness of −2 μm per year to be clinically significant, it takes 2.7 years to detect this change at 70% sensitivity and 2.9 years at 80% sensitivity for a patient with average test-retest variability (SD = 1.71 μm) if three measurements are obtained per year. Increasing the frequency to six measurements per year slightly shortens the duration to 2.1 and 2.3 years, respectively, whereas decreasing the frequency to one observation per year significantly lengthens the duration to 4.5 and 4.8 years, respectively. Patients with larger test-retest variability require longer duration to detect the same rate of change, particularly when the number of measurements obtained per year is small. Collectively, the number of measurements required depends on the rate of change to be considered as significant, the desired level of sensitivity, the patient's test-retest variability, and the acceptable duration for its detection, which may vary individually according to the severity of disease and the life expectancy. In general, the optimal number of measurements required per year is approximately four (i.e., the turning point of the curves) (
Fig. 4). The benefit of obtaining additional measurements in shortening the duration required to detect the same rate of change at the same level of sensitivity is small.
The next question to address is whether scheduling follow-up at regular intervals is important to measurements of the slope estimate. In ordinary least square (OLS), we reject the null hypothesis,
if
where
is the OLS estimate of β and
is the estimated variance of β̂ (see Appendix). Because β̂ is an unbiased estimator (the mean of β̂ is the same as the true parameter β, i.e.,
E(β̂) = β), changing
xi (the time interval between tests) has no effect on the estimation of β. Yet, maximizing ∑(
xi −
x̄)
2 (the denominator of
) can minimize
, thus improving the sensitivity. In other words, obtaining measurements only at the beginning and at the end of a defined follow-up period provides the highest sensitivity to detect change. This wait-and-see approach concurs with the visual field simulation model proposed by Crabb and Garway-Heath (
IOVS 2009;50:ARVO E-Abstract 1669). It is important to note the assumptions of the OLS are that measurements at each time point have the same variance and are time independent. These assumptions, however, may not be legitimate in a real-time clinical setting, especially when the follow-up duration is long. Progression of cataract, surgical interventions including trabeculectomy and cataract extraction, and instrument instability (e.g., lack of regular calibration) could substantially affect the quality of data collection and, hence, the reliability of measurement. Maximizing ∑(
xi −
x̄)
2 by collecting data only at the extreme ends may result in a more biased estimate than spacing the measurements at regular time intervals. Another shortcoming is that patients experiencing rapid progression could be missed if investigations are separated by a relatively long period. Further studies are needed to examine the effect of time-dependency factors on progression analysis.
Although the actual rate of age-related RNFL loss is largely unknown, cross-sectional analysis suggests that the average RNFL thickness reduces at a rate of approximately −0.2 μm/year.
31,32 If the follow-up duration is long enough, significant reduction in RNFL thickness might be detected, particularly for patients with small test-retest variability. Thus, the specificity of trend and event analyses would be reduced over time. Examining the rate estimate derived from TA is pertinent to differentiate glaucoma-related from age-related RNFL loss because the rate of glaucoma-related RNFL loss would be expected to exceed the rate of age-related changes. Studies investigating the rate of age-related loss of RNFL are eminently needed.
RNFL progression was modeled in this study because it has been well documented that RNFL measured by spectral-domain OCT has relatively low test-retest variability, thereby facilitating the detection of progression.
5,33 It is feasible to apply the simulation to other imaging or visual field testing modalities provided that the range of test-retest variabilities and the expected rate of change of a particular parameter of interest are available. For tests with inherently large test-retest variability and relatively rapid rate of change, EA with group RC would be preferable to TA for progression detection. Customizing the analysis strategy may be necessary to track disease progression in different types of structural and functional tests. Of note, the present study is limited in evaluating only global change in RNFL thickness derived from a circle scan with the objective of comparing trend and event analyses. Although it is plausible that local change of the RNFL may follow a similar pattern, further investigation is needed to fully address the differences in performance between TA and EA for the detection of local change.
There are different approaches to calculate sensitivity and specificity in defining progression. Although sensitivity may be given by the probability of any of the first n tests detected with a statistically significant progression (i.e., 1 − (1-P1)(1-P2)(1-P3)..(1-P n ), where P n is the probability of detecting a statistically significant progression at visit n), a more common approach in clinical practice is to compare the baseline with the latest available measurements without taking other tests obtained in between into consideration. For example, in the visual field Guided Progression Analysis (Carl Zeiss Meditec, Dublin, CA), progression is defined when the same three or more locations in the pattern deviation plot had change exceeding the limits of normal variability in the two latest consecutive tests (i.e., the same criteria as in the EMGT). In the Guided Progression Analysis of the Cirrus HD-OCT (Carl Zeiss Meditec) and the GDx ECC/VCC (Carl Zeiss Meditec), retinal nerve fiber layer thickness progression is defined as “possible loss” when the change in the last follow-up measurement exceeds the limits of normal variability and as “likely loss” when the change is evident in the two latest consecutive tests. Taking any of the tests obtained in the follow-up measurements to define progression will overestimate sensitivity and underestimate specificity.
In conclusion, the sensitivity to detect change in glaucoma progression is dictated by the pattern of progression, the rate of change of disease, the test-retest variability, and the analysis strategy. Although it is not possible to modify the pattern or the rate of disease progression, selecting an appropriate strategy to analyze progression is germane to maximizing the probability to detect change. For most patients, TA attains high sensitivity earlier than EA at a comparable level of specificity. For eyes with large test-retest variability, EA with group RC could be more sensitive than TA but at the expense of reduced specificity.