**Purpose.**:
We evaluated three new pixelwise rates of retinal height changes (PixR) strategies to reduce false-positive errors while detecting glaucomatous progression.

**Methods.**:
Diagnostic accuracy of nonparametric PixR-NP cluster test (CT), PixR-NP single threshold test (STT), and parametric PixR-P STT were compared to statistic image mapping (SIM) using the Heidelberg Retina Tomograph. We included 36 progressing eyes, 210 nonprogressing patient eyes, and 21 longitudinal normal eyes from the University of California, San Diego (UCSD) Diagnostic Innovations in Glaucoma Study. Multiple comparison problem due to simultaneous testing of retinal locations was addressed in PixR-NP CT by controlling family-wise error rate (FWER) and in STT methods by Lehmann-Romano's *k*-FWER. For STT methods, progression was defined as an observed progression rate (ratio of number of pixels with significant rate of decrease; i.e., red-pixels, to disk size) > 2.5%. Progression criterion for CT and SIM methods was presence of one or more significant (*P* < 1%) red-pixel clusters within disk.

**Results.**:
Specificity in normals: CT = 81% (90%), PixR-NP STT = 90%, PixR-P STT = 90%, SIM = 90%. Sensitivity in progressing eyes: CT = 86% (86%), PixR-NP STT = 75%, PixR-P STT = 81%, SIM = 39%. Specificity in nonprogressing patient eyes: CT = 49% (55%), PixR-NP STT = 56%, PixR-P STT = 50%, SIM = 79%. Progression detected by PixR in nonprogressing patient eyes was associated with early signs of visual field change that did not yet meet our definition of glaucomatous progression.

**Conclusions.**:
The PixR provided higher sensitivity in progressing eyes and similar specificity in normals than SIM, suggesting that PixR strategies can improve our ability to detect glaucomatous progression. Longer follow-up is necessary to determine whether nonprogressing eyes identified as progressing by these methods will develop glaucomatous progression. (ClinicalTrials.gov number, NCT00221897.)

^{ 1,2 }The loss of retinal nerve fibers causes characteristic changes in the appearance of the retinal nerve fiber layer as localized or diffuse defects and changes in the configuration of the optic disk.

^{ 3 }Detecting structural glaucomatous change over time, therefore, is a central aspect of detecting glaucomatous progression and management of glaucoma.

^{ 1 }Progressive retinal changes are observable in vivo using various optical imaging modalities. In this work, we focus on detecting glaucomatous progression from localized rates of structural changes using the Heidelberg RetinaTomograph (HRT; Heidelberg Engineering, GmbH, Germany).

^{ 4 }statistic image mapping of the retina (SIM),

^{ 5 }and proper orthogonal decomposition framework (POD),

^{ 6,7 }can detect retinal locations with glaucomatous change over time from serial HRT exams. The TCA and POD methods detect changes in retinal locations in each HRT follow-up exam with respect to a baseline and SIM detects retinal locations with significant rate of change during follow-up. Simultaneous assessment of thousands of retinal locations for progression increases the family-wise or overall false-positive detection errors. This tendency of multiple simultaneous tests to increase the overall false-positive errors or type I statistical errors is known as the multiple comparison problem.

^{ 8–13 }Patterson et al.

^{ 5 }pointed out the multiple comparison problem in retinal imaging when retinal locations are evaluated collectively to detect glaucomatous progression as in HRT TCA, and developed the SIM method based on a nonparametric statistical technique. The SIM evaluates the statistical significance of rate of change at each retinal location as well as controls the overall type I error due to simultaneous assessment of spatial extents of changes (cluster size) using a statistical procedure known as resampling. During resampling, the observed retinal measurements of an eye are treated as its sampling population and new unique samples are drawn from this population. In general, the following procedures are available for resampling: bootstrap (sampling with replacement),

^{ 14 }jackknife (leaving one observation out at a time),

^{ 15 }permutation (sampling without replacement),

^{ 16,17 }and subsampling (resampling fewer samples).

^{ 18 }

^{ 19 }parameter selection for glaucoma detection using a linear discriminant function,

^{ 20 }detecting glaucomatous changes over time using time-domain optical coherence tomography (O'Leary D, et al.,

*IOVS*2007;48:ARVO E-Abstract 3335), evaluating HRT stereometric parameters and topographic change analysis probability maps (Artes PH, et al.,

*IOVS*2008;49:ARVO E-Abstract 5430), data simulations to evaluate longitudinal glaucomatous changes in visual fields

^{ 21 }and optic nerve head,

^{ 22 }detecting glaucomatous progression of visual fields

^{ 23,24 }(O'Leary D, et al., NAPS Abstract, 2011), and is increasingly utilized for nonparametric statistical analysis in glaucoma research.

^{ 25–27 }

*k*–FWER procedure by Lehmann and Romano,

^{ 28 }which is less conservative than Bonferroni correction, and (3) PixR parametric single threshold test (PixR-P STT) controls overall false-positive errors at the pixel-level in a parametric framework using the

*k*-FWER procedure.

^{ 29 }; SAP visual field exams with fewer than 25% false-positives, false-negatives, and fixation losses, and no observable testing artifacts were considered to be reliable; and stereophotographs assessed as fair to excellent quality by trained graders were considered to be of acceptable quality. Median (interquartile range) MPHSD for the HRT exams was 15 (12–21) μm.

**Table 1**

**Table 1**

Progressing Eyes | Longitudinal Normal Eyes | Nonprogressing Patient Eyes | |

No. of eyes (No. of subjects) | 36 (33) | 21 (20) | 210 (148) |

Age, y | |||

Mean (95% CI) | 64.7 (61.6, 67.7) | 57.4 (49.7, 65.1) | 61.4 (59.4, 63.4) |

Median (range) | 65.0 (48.3, 83.3) | 57.0 (24.6, 86.5) | 64.4 (18.1, 85.5) |

Interquartile range | (57.0, 71.8) | (47.9, 67.9) | (53.2, 69.7) |

No. of HRT exams | |||

Median (range) | 5 (4–8) | 4 (4–8) | 4 (4–8) |

Interquartile range | (4.5–6) | (4–4) | (4–5) |

HRT follow-up, y | |||

Median (range) | 4.1 (2.4, 7.0) | 0.5 (0.2, 8.0) | 3.6 (1.7–7.4) |

Interquartile range | (3.7, 5.8) | (0.4, 0.7) | (2.9–4.5) |

SAP mean deviation at baseline, dB | |||

Mean (95% CI) | −3.65 (−5.45, −1.84) | −0.34 (−0.80, 0.13) | −1.72 (−2.16, −1.28) |

Median (range) | −2.15 (−21.74, 1.72) | −0.25 (−2.81, 1.31) | −0.95 (−30.13, 2.20) |

Interquartile range | (−4.16, −0.41) | (−0.88, 0.23) | (−2.35, −0.07) |

SAP PSD at baseline, dB | |||

Mean (95% CI) | 4.19 (2.87, 5.51) | 1.63 (1.45, 1.81) | 2.47 (2.18, 2.76) |

Median (range) | 2.30 (0.99, 13.18) | 1.53 (1.09, 2.84) | 1.73 (0.85, 13.32) |

Interquartile range | (1.73, 4.45) | (1.42, 1.77) | (1.45, 2.50) |

% abnormal disk from photo evaluation at baseline | 77.1%, 27 of 35 eyes* | 4.8%, 1 of 21 eyes | 45.2%, 95 of 210 eyes |

% abnormal visual field at baseline | 52.8%, 19 of 36 eyes | 4.8%, 1 of 21 eyes | 32.9%, 69 of 210 eyes |

% of abnormal disk from photo evaluation and abnormal visual field at baseline | 42.9%, 15 of 35 eyes* | 0.0%, 0 of 21 eyes | 19.5%, 41 of 210 eyes |

*h*

_{f}(ℓ) =

*β*

_{0}(ℓ) +

*β*

_{1}(ℓ) ×

*T*

_{f}+

*ε*

_{f}(ℓ) was fit at each retinal location with pixel coordinate ℓ in the HRT topographic series. Point estimates

*b*

_{0}and

*b*

_{1}, respectively, of the regression coefficients

*β*

_{0}and

*β*

_{1}were estimated using the method of least squares

^{ 30 };

*h*

_{f}is the retinal height (in μm) at follow-up time

*T*

_{f}(in months) and

*ε*

_{f}is the regression error (distributional characteristics given in Part B).

*β*

_{1}(ℓ) corresponding to time factor

*T*

_{f}. Therefore, the significance of the observed rate of retinal height decrease at each location was assessed using a null hypothesis of

*H*

_{0}(ℓ):

*β*

_{1}(ℓ) = 0 against a directional or one-sided alternative hypothesis

*H*

_{a}(ℓ):

*β*

_{1}(ℓ) > 0. The hypotheses were tested using a studentized test statistic

*t*(ℓ) =

*β*

_{1}(ℓ)/

*se*(

*β*

_{1}[ℓ]) at each HRT pixel, where

*se*(

*β*

_{1}) is the standard error of the regression coefficient

*β*

_{1}estimated assuming a normal distribution of

*β*

_{1}

^{ 30 }. A studentized or pivotal test statistic guarantees good performance under resampling.

^{ 13,31,32 }

*t*(ℓ) under the null hypothesis, known as a null-distribution (i.e., distribution of

*t*[ℓ] when

*β*

_{1}[ℓ] = 0), is required. The null-distribution of the test statistic

*t*(ℓ) was built at each retinal location ℓ from several unique pseudo-topographic series of the eye. The pseudo-topographic series were generated under the null hypothesis using the permutation resampling technique as follows.

^{ 14,16,33–35 }

*b*

_{1}[ℓ] = 0) were estimated at each retinal location in the observed topographic series as

*e*

_{f,H0}(ℓ) =

*h*

_{f}(ℓ) −

*b*

_{0}(ℓ), with point estimates of regression coefficients

*b*

_{0}(ℓ) and

*b*

_{1}(ℓ) estimated using the method of least squares. At each pixel, error terms under the null condition {

*e*

_{f,H0}(ℓ): f = 1, …, F} were assumed to be independent and identically distributed (i.e., errors were not autocorrelated and were with constant variance) over time

*T*

_{f}with no other distributional assumptions. Under the assumption of independent and identical distribution of model errors, the error terms

*e*

_{f,HO}(ℓ) over time

*T*

_{f}satisfy the exchangeability criterion for resampling

^{36–39}and, therefore, are exchangeable under the null hypothesis. Error terms

*e*

_{f,H0}(ℓ) were randomly permuted and 999 unique pseudo-topographic series were constructed under the null hypothesis as

*b*

_{0}(ℓ) +

^{32,35}; where, {

^{40}

*t*(ℓ) was built using a collection of 1000 test statistics. Each null-distribution comprised of one test statistic estimated from the observed retinal topographic series and 999 test statistics estimated from 999 unique pseudo-topographic series of the respective eye when there is no significant rate of change. Figures 1 and 2, respectively, show examples of a normal eye and a progressing eye. Figure 3 shows a procedural example of building a null-distribution of the test statistic

*t*(ℓ) in one retinal location (labeled k) in the progressing eye shown in Figure 2. Figure 4a shows permutation null-distributions of the test statistic at five selected retinal locations in the example progressing eye.

**Figure 1**

**Figure 1**

**Figure 2**

**Figure 2**

**Figure 3**

**Figure 3**

**Figure 4**

**Figure 4**

*F*= {

*H*

_{0}(ℓ): ℓ = 1, …,

*N*}, where

*H*

_{0}(ℓ) is the null hypothesis at location ℓ and

*N*is the total number of retinal locations within the disk. Because of the multiplicity of retinal locations simultaneously tested to assess glaucomatous progression, it is essential to control the overall type I error or false-positive detections (a multiple comparison problem

^{ 8 –13,41 }). In PixR-NP CT, family-wise type I error was controlled at pixel-level and cluster-level by controlling a

*family-wise error rate*using Bonferroni correction.

^{ 10,41 }

*H*

_{0}(ℓ) represent the complete null hypothesis for the family

*F*(i.e., a hypothesis that all retinal locations are true negatives with no significant rate of change). To control type I error at the pixel-level for the family of tests

*F*, an FWER or the probability of committing at least one type I error

^{12}was controlled at a level of significance

*α*

_{p}. Therefore, probability

*P*(at least one false-positive retinal location|

*α*

_{p}.

*P*(

*t*(ℓ) ≥

*t*

_{cutoff}|

*α*

_{p}), or where

*t*

_{cutoff}of the maximal rate-of-change test statistic

*t*

_{max}controls FWER at the chosen level of significance

*α*

_{p}. By using the critical value

*t*

_{cutoff}to determine the significance of rate of change at each retinal location, the overall false-positive detections (FWER) can be controlled at level

*α*

_{p}. The null-distribution of the maximal test statistic

*t*

_{max}is required to estimate the critical value

*t*

_{cutoff}. The critical value

*t*

_{cutoff}was estimated as the (1 −

*α*

_{p})th percentile value in the null-distribution of the maximal test statistic.

*t*

_{max}was built for each study eye using a collection of the maximum value of the test statistic (estimated as in Equation 1) in each of the 1000 unique pseudo-topographic series simulated in Part B. We preserved the observed dependence structure of retinal measurements and, thereby, preserved the statistical dependence structure among retinal measurements in each pseudo-topographic series by using the same temporal resampling order for all retinal locations in a given pseudo-topographic series. Figure 4b shows the null-distribution of the maximal rate-of-change test statistic

*t*

_{max}and

*P*values for the rate of change at selected retinal locations in the example progressing eye.

*P*values

*P*(

*t*[ℓ]) adjusted for family-wise type I error at level

*α*

_{p}were estimated for each retinal location ℓ. Therefore, locations with

*P*(

*t*[ℓ]) <

*α*

_{p}were considered to have significant rate of retinal height decrease. In this work, at pixel-level, we controlled family-wise type I error at a standard level of significance for one-tailed tests

*α*

_{p}= 2.5%. Glaucoma progression maps indicating locations with significant rate of retinal height decrease (called red-pixels with negative rate of change) and increase (called green-pixels with positive rate of change) were generated. Red-pixels corresponded to glaucomatous progression or noise and green-pixels corresponded to treatment effects or noise. Figures 1bi and 2bi show examples of glaucoma progression maps generated by the PixR-NP CT method.

*CS*

_{k}: k = 1, …,

*M*} equal the cluster sizes (size in number of pixels) of all the red-pixel clusters observed within the optic disk in a pseudo-topographic series. The probability of incorrectly inferring at least one red-pixel cluster as significant (i.e., FWER at cluster-level) was controlled at a level

*α*

_{c}.

*P*(

*CS*

_{k}) was determined using a null-distribution of the maximal cluster-size test statistic. The null-distribution of the maximal cluster-size test statistic for each study eye comprised of the size of the largest cluster of red-pixels within the optic disk in each of the pseudo-topographic series, given by {

*M*

_{j}; and j = 1, …, 1000}), where

*j*th pseudo-topographic series and

*M*

_{j}is the number of red-pixel clusters observed in the

*j*th pseudo-topographic series. Figure 4c shows the null-distribution of the maximal cluster-size test statistic

*CS*

_{max}for the example progressing eye.

*P*(cluster size

*CS*

_{k}) <

*α*

_{c}. A stricter cluster-level significance

*α*

_{c}= 1% was chosen similar to SIM.

*t*(ℓ), and permutation testing steps in PixR-NP STT are same as those of PixR-NP CT described in Section 1, Parts A and B.

*k*type I errors (instead of at most one type I error allowed by Bonferroni correction) by controlling a generalized FWER or

*k-*FWER

^{ 28 }at the pixel-level.

*P*(at most

*k*false-positive retinal locations |

*α*

_{p.}where,

*t*

_{k+1max}is the (k + 1)th largest test statistic among all retinal locations in a topographic series. Therefore, significance of retinal height changes

*P*(

*t*(ℓ)) were estimated using a null-distribution of the (k + 1)th largest test statistic given by {

*j*th pseudo-topographic series of the eye. The critical value

*t*

_{cutoff}estimated from the null-distribution of the (k + 1)th largest test statistic controls the probability of making at most

*k*false-positive errors at a level of significance

*α*

_{p}. Figure 5 shows the null-distribution of the (k + 1)th largest test statistic

*t*

_{k+1max}and

*P*values for the rate of change at selected retinal locations in the example progressing eye.

**Figure 5**

**Figure 5**

*k*to be 2.5% of total number of retinal locations within the optic disk (i.e.,

*k*= 2.5% of

*N*) and a level of significance of

*α*

_{p}= 1%. Thus, retinal locations with

*P*(

*t*[ℓ]) <

*α*

_{p}were considered to have significant rates of retinal height changes. Red- and green-pixels were identified as in PixR-NP CT. A measure of observed progression rate was estimated as a ratio of number of red-pixels to the total number of pixels within the optic disk margin. Because we allowed up to

*k*false-positive errors (

*k*= 2.5% of

*N*), the upper bound for the anticipated false-positive rate was

*k*/

*N*× 100 = 2.5%. Therefore, glaucomatous progression was detected by PixR-NP STT when the observed progression rate is greater than the anticipated false-positive rate of 2.5%.

*h*

_{f}(ℓ) and the test statistic

*t*(ℓ) described in Section 1, PartA were used to assess the rate of change at each retinal location. The regression error terms were assumed to be independent (i.e., no autocorrelation) and normally distributed

*ε*

_{f}∼

*N*(0,

*σ*

^{2}) with a constant variance

*σ*

^{2}over time (i.e., homoscedastic).

*P*(

*t*[ℓ]) was estimated using a directional

*t*-test (MATLAB function “glmfit” in Statistics Toolbox, ver. 2010b; Mathworks, Inc., Natick, MA). Pixel-level type I error was controlled using the less conservative

*k*-FWER procedure as in PixR-NP STT, but in a parametric framework.

^{ 28 }For

*k*-FWER control, we allowed up to

*k*type I errors at a level of significance

*α*

_{p}with

*k*as 2.5% of number of retinal locations (pixels) within the optic disk margin. The level of significance

*α*

_{p}was determined empirically, such that the diagnostic performance of the PixR-P STT is similar to that of the PixR-NP STT method. To control

*k*-FWER at level

*α*

_{p}, a common

*P*value threshold was estimated as (

*k*+ 1) ×

*α*

_{p}/

*N*using the single-step

*k-*FWER method by Lehmann and Romano.

^{ 28 }Retinal locations with

*P*value

*P*(

*t*[ℓ]) less than the

*k*-FWER-based

*P*value threshold were considered to have significant rates of retinal height changes. Red- and green-pixels were identified as in PixR-NP CT based on the significance of the rate of retinal height decrease and increase, respectively. A measure of observed progression rate was estimated as a ratio of number of red-pixels to the total number of pixels within the optic disk margin. Because,

*k*-FWER control provides an upper bound for the number of false-positives as

*k*(2.5% of number of locations within disk), glaucomatous progression was detected by PixR-P STT when the observed progression rate was greater than 2.5%.

^{ 5 }In brief, a standardized slope test statistic was estimated as

*S*(ℓ) = |

*b*

_{1}(ℓ)/

*ŝ*(

*b*

_{1}[ℓ])|, where |.| operator gives the absolute value and

*ŝ*(

*b*

_{1}[ℓ]) is a smoothed standard error estimate of the regression coefficient

*b*

_{1}(ℓ). The spatially smoothed standard error estimate

*ŝ*(

*b*

_{1}[ℓ]) was computed by spatially filtering standard error estimates of the regression coefficient

*b*

_{1}using a Gaussian kernel of size 17 × 17 pixels and standard deviation of 2.354 pixels. Pseudo-topographic series were generated from the raw topographic retinal measurements using the permutation resampling procedure (not under the null hypothesis). Pixel-level changes were estimated using a probability suprathreshold or a global threshold of

*α*

_{p}= 5% and family-wise type I error was controlled only at the cluster-level at a level of significance

*α*

_{c}= 1%. Glaucomatous progression was detected when there was at least one red-pixel cluster within the disk with

*P*<

*α*

_{c}.

*α*

_{p}= 25%, the parametric PixR-P STT was able to provide a similar diagnostic accuracy as the nonparametric PixR-NP STT, but with reduced computational demands. The SIM method provided similar specificity (90%) and lower sensitivity (39%) than the PixR strategies. The SIM had the highest specificity in the nonprogressing eyes (79%) followed by PixR-NP STT (56%), PixR-P STT (50%), and PixR-NP CT (49%). When the specificity of PixR-NP CT in the longitudinal normal eyes was set to 90% (by changing

*α*

_{c}from 1% to 0.8%) similar to all other methods, sensitivity was 86%, and specificity in the nonprogressing patient eyes was 55% for PixR-NP CT.

**Table 2**

**Table 2**

Techniques | Test Type | Sensitivity (95% CI) in Progressing Eyes, n = 36 Eyes | Specificity (95% CI) in Longitudinal Normal Eyes, n = 21 Eyes | Specificity (95% CI) in Nonprogressing Eyes, n = 210 Eyes |

SIM Patterson et al.^{5} | Nonparametric test by resampling raw measurements | 39% (22–56) | 90% (76–100) | 79% (73–84) |

PixR-NP Cluster Test, PixR-NP CT | Nonparametric test by resampling regression residuals under the null hypothesis | 86% (73–99) | 81% (62–100) | 49% (42–56) |

At 90% specificity in the longitudinal normal eyes | 86% (73–99) | 90% (76–100) | 55% (48–62) | |

PixR-NP Single Threshold, PixR-NP STT | Nonparametric test by resampling regression residuals under the null hypothesis | 75% (59–91) | 90% (76–100) | 56% (49–63) |

PixR-P Single Threshold, PixR-P STT | Parametric test | 81% (66–95) | 90% (76–100) | 50% (43–57) |

*k*-maximal rate-of-change test statistic required for controlling

*k*-FWER in the PixR-NP STT method. By comparing the null distributions of the maximal rate-of-change test statistic and the

*P*values at selected retinal locations in Figure 4b versus Figure 5, it can be observed that the

*k*-FWER procedure is less conservative than the FWER or Bonferroni procedure. For example, at the retinal location labeled k, the FWER and

*k*-FWER procedures provided a

*P*value of

*P*(k) = 0.340 and

*P*(k) = 0.005, respectively.

**Table 3**

**Table 3**

Techniques | Nonprogressing Eyes With “Possible Progression” by SAP GPA Within the HRT Follow-up Duration, n = 54 Eyes (%) | Nonprogressing Eyes With “Possible Progression” by SAP GPA Within 1 Year After the Last HRT Follow-up, n = 7 Eyes (%) | Nonprogressing Eyes With at Least 2 Progression Points, Possible and/or Likely, Including at Least 1 Likely Progression Point in SAP GPA Within the HRT Follow-up Duration, n = 17 Eyes (%) |

SIM | 15 (27.8) | 2 (28.6) | 1 (5.9) |

PixR-NP cluster test, PixR-NP CT | 31 (57.4) | 5 (71.4) | 11 (64.7) |

PixR-NP single threshold, PixR-NP STT | 27 (50.0) | 5 (71.4) | 5 (29.4) |

PixR-P single threshold, PixR-P STT | 29 (53.7) | 5 (71.4) | 6 (35.3) |

^{ 42 }and the POD framework (43%–51%)

^{ 7 }in previous studies. Moreover, our post hoc analysis (Table 3) of the nonprogressing patient eyes suggested that HRT techniques are detecting subtle and/or early stages of visual function progression. Longer follow-up is needed to determine whether the progression detected by PixR in the nonprogressing patient eyes later develops into GPA “likely progression.”

*α*

_{p}= 5% (two-tailed) at each pixel location and addresses the multiple comparison problem at the cluster-level (using Bonferroni correction) at an LOS

*α*

_{c}= 1%.

^{ 5 }For PixR-NP CT, we used the same LOS as SIM, namely

*α*

_{p}= 2.5% (one-tailed) at pixel-level and

*α*

_{c}= 1% at cluster-level. In contrast to SIM and PixR-NP CT methods, we used a less conservative Lehman-Romano's

*k-*FWER procedure for addressing multiple comparison problem at pixel-level in PixR-NP STT and PixR-P STT methods. Similar to the standard LOS for one-tailed tests, we chose the number of false-positive errors

*k*allowed by the

*k-*FWER procedure to be 2.5% (of retinal locations within the disk margin). For the PixR-NP STT method, because we allowed up to

*k*errors, we chose a stricter level of significance of

*α*

_{p}= 1% (one-tailed) to address the multiple comparison problem. In the parametric PixR-P STT method, probabilities of progression were estimated parametrically using

*t*-tests. Therefore, we empirically determined that at a liberal level of significance

*α*

_{p}= 25% (one-tailed), the PixR-P STT parametric method is able to achieve a similar diagnostic accuracy as the PixR-NP STT nonparametric method. Differences in the LOS between parametric and nonparametric PixR STT methods highlight likely significant differences between their respective

*k*-maximal test statistic distributions. Further study of these techniques and progression criteria using independent population groups will be useful to identify limitations and avenues for improvement of these techniques.

*k*-FWER method was used in PixR-NP STT and PixR-P STT methods to control type I error while maximizing the detection power (or reducing type II error; see Fig. 2bi versus Fig. 2bii). Because the number of clusters is fewer than the number of locations, SIM circumvents the issue of increased type II error at pixel-level by controlling type I error only at the cluster-level.

^{ 42–44 }Entirely avoiding type I control at pixel-level, however, may affect the diagnostic accuracy of the cluster-based progression criterion because of the increased number and increased chances of false progressing locations present in the cluster-level analysis. Thus, the sensitivity of SIM is likely to be influenced by the spatial extent (or cluster-size) of glaucomatous progression.

^{ 45 }For example, glaucomatous progression with smaller spatial extents may be missed by the cluster-level test alone because of the increased type I error propagated from the pixel-level to the cluster-level test. Therefore, before cluster-level analysis in SIM, the suprathreshold or the common global probability threshold at pixel-level should be adjusted depending on the anticipated spatial extent of changes (i.e., large diffuse changes versus regional changes with smaller spatial extent).

^{ 45,46 }For optimal control of false change locations, empirically derived thresholds of rate of change may be used during pixel-level analysis

^{ 47 }with the difficulty that the absolute rate of change may vary by region.

*α*is exactly

*α*.

^{ 48 }In practice, building the full permutation distribution comprising of all unique permutations of the observed measurements is computationally demanding. For example, as pointed by Patterson et al.,

^{ 5 }an eye with four HRT exams will have 369,600 unique permutations at each retinal location. Therefore, approximate permutation distributions were built by Monte Carlo sampling of the full permutation distribution. The Monte Carlo permutation test, however, also provides an exact or at least an asymptotically exact control of type I error at the chosen significance level

*α*.

^{ 31,49,50 }In contrast, bootstrap control of type I error is only asymptotically exact. Therefore, Monte Carlo permutation tests are more suitable for detecting glaucomatous progression.

^{ 51,52 }and Westfall-Young guidelines for multiple testing,

^{ 13 }pseudo-topographic series not generated under the null hypothesis may result in loss of power; however, loss of power is minimized when a studentized test statistic is used for hypothesis testing.

^{ 32 }Further, because overall type I error control requires a joint distribution (i.e., maximal null-distribution) of the test statistics, centering the test statistics under the null hypothesis is likely an optimal choice to address multiple comparison problem in retinal imaging. In PixR strategies, pseudo-topographic series for each study eye were generated by resampling residuals under the null hypothesis. Other promising approaches for building null-distributions are generating pseudo-topographic series using physiology-based simulations

^{ 53 }and by generating pseudo follow-up exams with no changes using the baseline subspace of the proper orthogonal decomposition framework.

^{ 7 }

^{ 36 }When regression errors are autocorrelated or when error variance is not constant over time (heteroscedasticity), resampling residuals over time will not preserve the observed temporal (time) dependence structure or autocorrelation in pseudo-topographic series. This may result in inaccurate null distributions and affect the exactness of the tests. Therefore, though permutation tests are exact and do not assume normality of distribution, likely heteroscedasticity (e.g., due to changes in disease severity) or autocorrelated errors in topographic series may violate the exchangeability criterion required for resampling and may affect its performance.

^{ 54 }One of the sources of autocorrelated errors is methodological due to omission of one or more key predictor variables, such as the IOP at each follow-up, in the regression model.

^{ 30 }Model errors autocorrelated over time in a retinal time series of an eye can be modeled as a moving-average process.

^{ 55 }In addition to autocorrelated errors, there is an obvious autocorrelation of retinal height measurements in the time series of an eye due to repeated measurements in each retinal location in each eye over time. Retinal measurements autocorrelated over time within an eye can be modeled as an autoregressive process.

^{ 55 }Possible variation of autocorrelation among retinal locations and among eyes, as well as presence of glaucomatous progression should be factored while building such retinal time series models. Further studies are necessary to characterize fully sources of autocorrelation of retinal measurements in optical retinal images acquired over time. Upon characterizing autocorrelation and heteroscedasticity in the optic nerve head time series, several procedures are available to account for autocorrelation and heteroscedasticity during resampling.

^{ 30,56–60 }

^{ 36 }This is due to the fact that spatial information from all retinal locations are combined to control family-wise type I error by building a null distribution of the maximal rate-of-change test statistic and maximal cluster-size test statistic. In the nonparametric PixR-NP methods, the observed spatial correlation among retinal measurements in each eye were preserved in all simulated pseudo-topographic series by using the same temporal resampling order at each retinal location.

^{ 13,51 }Therefore, a single hypothesis evaluating a simple linear regression is not affected in the absence of a pivotal statistic.

^{ 33 }Westfall-Young,

^{ 13 }however, recommends a pivotal statistic for multiple comparison problems. In PixR methods, a studentized test statistic based on the regression coefficient (

*β*

_{1}/

*SE*[

*β*

_{1}], where

*SE*is the standard error) was used for hypothesis testing. When a nonstudentized test statistic (

*β*

_{1}) was used in PixR-NP CT, sensitivity decreased to 44% (95% confidence interval [CI] = 27%–62%) with the same specificity in normals of 81% (62%–100%). Although pivotal statistic does not have a significant effect on the diagnostic accuracy of an individual hypothesis test, it is evident from the diagnostic accuracy of PixR-NP CT that a pivotal statistic facilitates improved diagnostic accuracy for multiple comparison problems by providing a uniform type I error rate in all retinal locations.

^{ 36 }A uniform type I error rate is achieved in all retinal locations because the distribution of a pivotal statistic is independent of the data generating distribution at each retinal location and, therefore, is comparable across retinal locations during type I error control (for building the null-distribution of the maximal test statistic). Further, use of a pivotal statistic assures an asymptotically exact Monte Carlo permutation test.

^{ 31,49 }To standardize the test statistic in the nonparametric PixR methods, the standard errors of the regression coefficients

*SE*(

*β*

_{1}) were estimated in a parametric setup to reduce computational demands. Improved standard error estimates can be obtained by using a second level of bootstrap for each pseudo-topographic series.

^{ 14 }

^{ 13 }While detecting localized glaucomatous progression, the subset pivotality guideline for multiple testing is trivially satisfied because the significance of the rate of change observed in a retinal location is not dependent on the significance of other retinal locations (e.g., in contrast, tests for spatial correlation among retinal locations do not satisfy the subset pivotality guideline

^{ 13,36,61 }). Therefore, the nonparametric PixR-NP CT method provides a strong control of the family-wise error rate.

^{ 62,63 }It has potential for use with the parametric PixR-P strategy and may facilitate real-time inferences on cluster-size statistics from retinal images.

*k*-FWER strategies, the PixR provided higher diagnostic accuracy for detecting glaucoma progression than SIM. Moreover, in nonprogressing patient eyes, retrospective inspection indicated that PixR techniques are detecting a higher proportion of subtle and/or early stages of visual function progression than SIM. The PixR strategies show promise for improving our ability to detect glaucoma progression.

*IOVS*reviewers whose suggestions strengthened the manuscript.

**M. Balasubramanian**, None;

**E. Arias-Castro**, None;

**F.A. Medeiros**, Carl Zeiss Meditec, Inc. (F), Heidelberg Engineering, GmbH (F), Sensimed, Inc. (F), Topcon Medical Systems, Inc. (F);

**D.J. Kriegman**, None;

**C. Bowd**, None;

**R.N. Weinreb**, Heidelberg Engineering, GmbH (F), Topcon Medical Systems, Inc. (F, C), Nidek (F), Carl Zeiss Meditec, Inc. (C), Optovue, Inc. (C);

**M. Holst**, None;

**P.A. Sample**, None;

**L.M. Zangwill**, Carl Zeiss Meditec, Inc. (F), Heidelberg Engineering, GmbH (F), Topcon Medical Systems, Inc. (F), Optovue, Inc. (F)

*. The Hauge, Netherlands: Kugler Publications; 2004: 9–12.*

*Glaucoma Diagnosis: Structure and Function**. Florence, KY: Taylor & Francis e-Library; 2002: 279.*

*Handbook of Glaucoma**. 2000; 19: 1–40. [CrossRef] [PubMed]*

*Prog Retin Eye Res**. 2000; 41: 775–782. [PubMed]*

*Invest Ophthalmol Vis Sci**. 2005; 46: 1659–1667. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2012; 53: 3615–3628. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2010; 51: 264–271. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. New York, NY: Springer; 2008.*

*Multiple Testing Procedures With Applications to Genomics**. Amsterdam, The Netherlands: North-Holland Pub. Co.; 1980: 617–622.*

*Handbook of Statistics: Analysis of Variance**. New York, NY: Springer-Verlag; 1981.*

*Simultaneous Statistical Inference*, 2d ed*. Amsterdam, The Netherlands: North-Holland; 1996: 587–630.*

*Handbook of Statistics: Design and Analysis of Experiments**. New York, NY: Wiley; 1993.*

*Resampling-Based Multiple Testing: Examples and Methods for P Value Adjustment**. New York, NY: Chapman & Hall; 1993.*

*An Introduction to the Bootstrap**. 1974; 61: 1–15.*

*Biometrika**. New York, NY: Hafner Pub. Co. 1960.*

*The Design of Experiments*, 7th ed*. 1937; 4 (suppl): 119–130.*

*J Roy Stat Soc**. 2001; 11: 1105–1124.*

*Stat Sinica**. The Hague, The Netherlands: Kugler Publications; 2000.*

*The Shape of Glaucoma**. 2005; 139: 44–55. [CrossRef] [PubMed]*

*Am J Ophthalmol**. Yantai: 2010: 2356–2362.*

*3rd International Conference on Biomedical Engineering and Informatics (BMEI)**. 2005; 24: 333–354. [CrossRef] [PubMed]*

*Prog Retin Eye Res**. 2007; 48: 1642–1650. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2010; 14: 79–85. [CrossRef]*

*IEEE Trans Inf Technol Bioed**. 2011; 118: 1334–1339. [PubMed]*

*Ophthalmology**. 2008; 49: 2449–2455. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2011; 52: 1290–1296. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. 2005; 33: 1138–1154. [CrossRef]*

*Ann Stat**. 2009; 127: 1136–1145. [CrossRef] [PubMed]*

*Arch Ophthalmol**. Chicago, IL: Richard D. Irwin, Inc.; 1996.*

*Applied Linear Regression Models*, 3rd ed*. 1989; 51: 459–467.*

*J Roy Stat Soc B Met**. 2005; 74: 356–365. [CrossRef]*

*Stat Probabil Lett**. New York, NY: Cambridge University Press; 1997.*

*Bootstrap Methods and Their Application**. New York, NY: Marcel Dekker; 1995.*

*Randomization Tests*, 3rd ed*. Boca Raton, FL: Chapman & Hall/CRC; 2007.*

*Randomization, Bootstrap and Monte Carlo Methods in Biology*, 3rd ed*. 2003; 12: 419–446. [CrossRef] [PubMed]*

*Stat Methods Med Res**. 2001; 58: 626–639. [CrossRef]*

*Can J Fish Aquat Sci**. New York, NY: Springer; 2000.*

*Permutation Tests: a Practical Guide to Resampling Methods for Testing Hypotheses*, 2nd ed*. 2002; 1: 243–247.*

*J Mod App Stat Meth**. 2008; 146: 162–169. [CrossRef]*

*J Econometrics**. 2001. Available at: http://www.aghmed.fsnet.co.uk/bonf/bari.pdf. Accessed July 12, 2011.*

*Bari**, vol. 1252. Orlando: 1992: 1259–1261.*

*Nuclear Science Symposium and Medical Imaging Conference, 1992, Conference Record of the 1992 IEEE**. 1993; 13: 425–437. [CrossRef] [PubMed]*

*J Cereb Blood Flow Metab**. 1993; 1: 3–19. [CrossRef]*

*Hum Brain Mapp**. 1997; 5: 83–96. [CrossRef] [PubMed]*

*NeuroImage**. 2002; 15: 1–25. [CrossRef] [PubMed]*

*Hum Brain Mapp**. 2006; 47: 2904–2910. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**. New York, NY: Wiley; 1989.*

*Computer-Intensive Methods for Testing Hypotheses: An Introduction**. 2008; 70: 415–429. [CrossRef]*

*Oxford B Econ Stat**. 2004; 19: 676–685. [CrossRef]*

*Stat Sci**. 1991; 47: 757–762. [CrossRef]*

*Biometrics**. 1992; 48: 969–970.*

*Biometrics**. 51: 6472–6482. [CrossRef] [PubMed]*

*Invest Ophthalmol Vis Sci**ρ*= 0.

*. 1996; 1: 184–198. [CrossRef]*

*Psychol Methods**. Boca Raton, FL: CRC Press LLC; 2003.*

*The Analysis of Time Series: An Introduction**. 2004; 23: 1–25. [CrossRef] [PubMed]*

*Hum Brain Mapp**. 2003; 12: 375–399. [CrossRef] [PubMed]*

*Stat Methods Med Res**. 2001; 12: 61–78. [CrossRef] [PubMed]*

*Hum Brain Mapp**. 2005; 25: 859–867. [CrossRef] [PubMed]*

*Neuroimage**. 2005; 49: 361–376. [CrossRef]*

*Comput Stat Data An**. 1999; 18: 32–42. [CrossRef]*

*IEEE T Med Imaging**. 2003; 20: 2343–2356. [CrossRef] [PubMed]*

*NeuroImage**. New York, NY: Springer; 2001: 169–182.*

*Spatial Statistics: Methodological Aspects and Applications*