**Purpose.**:
Ordinary least squares linear regression (OLSLR) analyses are inappropriate for performing trend analysis on repeatedly measured longitudinal data. This study examines multilevel linear mixed-effects (LME) and nonlinear mixed-effects (NLME) methods to model longitudinally collected perimetry data and determines whether NLME methods provide significant improvements over LME methods and OLSLR.

**Methods.**:
Models of LME and NLME (exponential, whereby the rate of change in sensitivity worsens over time) were examined with two levels of nesting (subject and eye within subject) to predict the mean deviation. Models were compared using analysis of variance or Akaike's information criterion and Bayesian information criterion, as appropriate.

**Results.**:
Nonlinear (exponential) models provided significantly better fits than linear models (*P* < 0.0001). Nonlinear fits markedly improved the validity of the model, as evidenced by the lack of significant autocorrelation, residuals that are closer to being normally distributed, and improved homogeneity. From the fitted exponential model, the rate of glaucomatous progression for an average subject of age 70 years was −0.07 decibels (dB) per year. Ten years later, the same eye would be deteriorating at −0.12 dB/y.

**Conclusions.**:
Multilevel mixed-effects models provide better fits to the test data than OLSLR by accounting for group effects and/or within-group correlation. However, the fitted LME model poorly tracks visual field (VF) change over time. An exponential model provides a significant improvement over linear models and more accurately tracks VF change over time in this cohort.

^{ 1,2 }Of vital interest in glaucoma management is the assessment of disease progression by monitoring functional and structural changes.

^{ 3–5 }Although both functional and structural changes can provide evidence of disease progression, tracking glaucomatous VF progression is of key importance because functional testing directly relates to the activities of daily living.

^{ 6 }Standard automated perimetry, for example as performed by the Humphrey Field Analyzer (Carl Zeiss Meditec Inc., Dublin, CA), is currently the most widely used test for detecting functional damage.

^{ 7 }Trend analyses are increasingly being performed on perimetry data gathered from patients with glaucoma. Some examples are trend analyses of the mean deviation (MD), the Visual Field Index, or pointwise linear regression of VF sensitivities. Despite the availability of various advanced technologies for tracking functional progression, trend analyses using ordinary least squares linear regression (OLSLR) are still widely used statistical methods in the vision science literature

^{ 8–12 }because of its straightforward and mathematically more tractable applications. Numerous studies have used clinical applications of OLS regression analyses, for example when measuring treatment effects,

^{ 11 }examining characteristics of types of glaucoma,

^{ 12 }and assessing the rate of VF progression.

^{ 10 }However, OLSLR, the most commonly used method for trend analysis in the literature, may be inappropriate for modeling longitudinal data. Its validity, and hence the

*P*values it produces, relies on not violating several assumptions, as discussed below.

^{ 13–15 }have suggested that the variability of VF sensitivities is increased when sensitivity is lower in both normal and damaged eyes.

^{ 13,15 }Therefore, performing an OLSLR on longitudinally collected data violates the principle of homoscedasticity.

^{ 16 }This can be caused by longer-term random fluctuations, whose effects last more than the interval between tests, or by longer-term fluctuations in causative factors that were not included in the model. The same effect can also occur when a linear fit is imposed on data that are actually changing in a nonlinear manner. As an example, the measure of interest (in this case, the MD) is assumed to change at a constant rate over time. Recent cross-sectional studies

^{ 17–19 }indicate that change in sensitivity when expressed on a linear (1/contrast) scale appears to be proportional to the percentage loss of retinal ganglion cells (RGCs). Therefore, given the logarithmic nature of the decibel (dB) scale used in perimetry, linear change in sensitivity over time implies that the proportion of remaining RGCs that die each year must remain constant. That is, if 10% of the RGCs die in a year, 10% of the surviving RGCs would die in the following year, and so forth. If we hypothesize an alternative formulation, where the actual number of remaining RGCs that die each year remains constant, resulting in a linear decline in structural measures such as retinal nerve fiber layer thickness, then based on the cross-sectional findings this would result in the loss of sensitivity in decibels appearing to accelerate exponentially. If a linear fit is placed over data that are accelerating downward exponentially, the residuals will tend to be positive in the center of the period and negative at either end of the sequence, causing potentially significant temporal autocorrelation. In the presence of heteroscedasticity or temporal autocorrelation, OLS estimators are no longer the best linear unbiased estimators. Thus, statistics and confidence intervals from OLS may not be valid for drawing inference.

*P*values for assessing the significance of change in VF sequences from participants with glaucoma. These findings are also compared with those from OLSLR. We then consider the use of a nonlinear model in which VF sensitivity declines exponentially with time based on the alternative formulation outlined above, and we determine whether this removes evidence of autocorrelation. We demonstrate and test these statistical methods with a relatively simple predictive model using only age as a predictor of the rate of glaucomatous field change. However, the overall goal of this work is to aid in the development of better predictive models that will help clinicians manage patients with glaucoma.

^{ 5,20 }African ancestry,

^{ 20 }systemic hypertension,

^{ 21 }peripheral vasospasm,

^{ 22 }migraine,

^{ 23 }self-reported family history of glaucoma,

^{ 24 }disc hemorrhage,

^{ 25,26 }diet-controlled diabetes,

^{ 27 }and/or previously diagnosed glaucomatous optic neuropathy or suspicious optic nerve head appearance (cup-disc ratio asymmetry > 0.2) and neuroretinal rim notching or narrowing. Participants having visual acuity worse than 20/40 in either eye or worse than mild glaucoma, cataract, or media change at baseline were excluded. Other exclusion criteria included any other disease or the use of any medications likely to affect the VF or having undergone intraocular surgery (except for uncomplicated cataract surgery).

^{ 28 }Initially, we used the Full Threshold algorithm and the 30-2 test pattern but later used the 24-2 test pattern and the Swedish Interactive Threshold Algorithm (SITA).

^{ 29 }All MD values used in this study were 24-2 equivalent. This was accomplished by including earlier 30-2 tests within a Guided Progression Analysis

^{ 30 }sequence that included at least one 24-2 test. The Humphrey Field Analyzer II (Carl Zeiss Meditec Inc.) then produces 24-2 equivalent MDs for the included 30-2 tests. Previously, we have shown that there is no difference in MD values generated by the Full Threshold and SITA Standard algorithms,

^{ 31 }so there is no concern with using data from both. An optimal lens correction was placed before the tested eye, and an eye patch was used to occlude the fellow eye. All subjects had previous experience with VF testing before entering the study, and most had undergone multiple previous tests. Tests with greater than 33% fixation losses or false negatives or with greater than 15% false positives were considered unreliable and were excluded.

^{ 32 }Only participants with 10 or more observations per eye meeting the reliability criteria were included in the analyses. Follow-up time and age at the time of testing centered to its mean (a suggested risk factor for glaucomatous progression

^{ 20 }) were used as covariates. To find the best predictive model for the MD, both linear mixed-effects (LME) and nonlinear mixed-effects (NLME) models with two levels of nesting (subject and eye within subject) were fitted (details of the models considered are given in the Appendix). Discrete and continuous autoregressive methods were tested to examine temporal dependency likely to be present in within-group errors. Autoregressive (discrete or continuous) methods (CAR1) are used to model serial correlation of within-group errors in time series and longitudinal data, where the correlation between residuals from the model decreases exponentially with the time difference between measurements.

^{ 33 }

*P*values from ANOVA. Nonnested models (models I vs. III and models II vs. IV) cannot be compared using ANOVA and so instead are compared based on AIC and BIC, with smaller AIC and BIC indicating a better fit to the data. These AIC (2

*θ*− 2log(L)) and BIC (

*θ*log(L) − 2log(L)) are commonly used penalized model selection criteria for comparison and selection of models, where

*θ*, L, and

*n*are the total number of parameters to be estimated, the likelihood function, and the sample size, respectively.

*θ*acts to penalize models with a greater number of free parameters.

**Figure 1.**

**Figure 1.**

*P*values from ANOVA comparisons, respectively. Model II provides a significantly better fit than model I (

*P*< 0.0001), indicating the presence of significant temporal autocorrelation that needs to be taken into account when performing linear fits. The AIC and BIC are much lower (better) for model III and model IV compared with any of the candidate LME models (Table 1) and show that exponential models fit sequences of MD values significantly better than assuming linear change over time. However, model III and model IV provide equivalent fits (

*P*= 0.47). This indicates that when a suitable nonlinear model is used to examine change over time temporal autocorrelation is no longer significant. It appears to be caused by incorrectly imposing a linear fit, rather than by long-term fluctuation acting within the sequences.

**Table 1.**

**Table 1.**

Model | AIC | BIC |

I | 12,850.03 | 12,913.41 |

II | 12,742.35 | 12,812.07 |

III | 12,259.04 | 12,322.42 |

IV | 12,261.55 | 12,331.28 |

**Table 2.**

**Table 2.**

Model | II | III | IV |

I | P < 0.0001 | NA* | P < 0.0001 |

II | P < 0.0001 | NA* | |

III | P = 0.47 |

*P*values are from an ANOVA comparison between the two models indicated. The models are defined in Table 1.

**Figure 2.**

**Figure 2.**

^{ 13–15,34 }

**Figure 3.**

**Figure 3.**

^{ 22 }This apparent lack of autocorrelation is consistent with the fact that when using NLME models there is no longer a significant benefit in accounting for correlated errors (models III vs. IV,

*P*= 0.47). This supports our earlier assertion that errors could appear correlated if a linear fit is imposed onto nonlinear data.

**Figure 4.**

**Figure 4.**

*λ*

_{0},

*κ*

_{0}, and

*γ*

_{0}), corresponding SEs, and associated

*P*values for the exponential model III. The average baseline MD (

*λ*

_{0}) and intercept of rate (

*κ*

_{0}) are significantly different from zero (

*P*< 0.0001). The estimated maximum correlation among the fitted parameters was −0.2, indicating no overfitting. The predicted sensitivity for an average subject at time (

*t*) is given by MD

_{t}=

*λ*

_{0}− exp(

*κ*

_{0}+

*γ*

_{0}· age)

*t*, with the rate of change given by −(

*κ*

_{0}+

*γ*

_{0}· age) · exp(

*κ*

_{0}+

*γ*

_{0}· age)

*t*. For an average 70-year-old subject in this cohort (i.e., when the random effects are set to be zero), his or her MD at time zero (i.e., baseline) is 0.26 dB, with a rate of change of −0.062 dB/y. Ten years later (i.e.,

*t*= 10), the VF will be deteriorating at −0.11 dB/y.

**Table 3.**

**Table 3.**

Parameter | Estimate (SE) | P Value |

λ _{0}, dB | 1.26 (0.15) | <0.0001 |

κ _{0}, dB/y | 0.064 (0.008) | <0.0001 |

γ _{0}, dB/y | −0.0009 (0.0006) | 0.17 |

**Figure 5.**

**Figure 5.**

*P*< 0.0001) when the MD is fitted using OLSLR. However, age is not a significant risk factor (

*P*> 0.05) using either the fitted LME or NLME models. Note that this does not imply that age is not in fact associated with progression because the power to detect an effect may not be sufficient in this data set. Lack of evidence of an effect should not be taken as evidence for the lack of an effect. However, it illustrates that the significance of an effect can be overstated when an inappropriate model is used.

^{ 35 }The often-reported delay between when structural loss is first detected and when functional loss is first reported could be partly because of the logarithmic scaling (in decibels) of units used in perimetry. In an exponential model, the observed rate of change in decibels accelerates as the disease progresses, without assuming any fundamental change in the rate of RGC loss. Although the mixed-effects approach used in this study cannot be applied to individual patients, the method demonstrates that an exponential model in which the rate of progression increases over time should be used for individual patient data.

^{ 36–38 }For end-stage disease, it is technical limitations, rather than any pathophysiological process, that drive results. At present, no model has been proposed that successfully accounts for both this floor effect and the likelihood that the VF was effectively stable for many years before detectable progression commencing. Work to derive and validate such a model is ongoing.

^{ 39 }and the pathophysiological process of glaucomatous disease progression. It is possible that a nonlinear trend could be caused by learning effects. We believe that this is unlikely to fully explain our results because all subjects had at least some experience with VF testing before entering the study, so learning effects should have been small.

**Figure 6.**

**Figure 6.**

**M. Pathak**, None;

**S. Demirel**, None;

**S.K. Gardiner**, None

*Lancet*. 2004; 363: 1711–1720. [CrossRef] [PubMed]

*Br J Ophthalmol*. 2006; 90: 262–267. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2006; 47: 2904–2910. [CrossRef] [PubMed]

*Arch Ophthalmol*. 2010; 128: 276–287. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53: 3598–3604. [CrossRef] [PubMed]

*Am J Ophthalmol*. 2012; 154: 445–451. [CrossRef] [PubMed]

*Asia Pacific J Ophthalmol*. 2012; 1: 135–139. [CrossRef]

*Invest Ophthalmol Vis Sci*. 1996; 37: 1419–1428. [PubMed]

*Invest Ophthalmol Vis Sci*. 2002; 43: 1400–1407. [PubMed]

*Arch Ophthalmol*. 2012; 130: 1541–1546. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53: 1704–1709. [CrossRef] [PubMed]

*Ophthalmol*. 2009; 116: 2271–2276. [CrossRef]

*Invest Ophthalmol Vis Sci*. 2012; 53: 5985–5990. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2002; 43: 2654–2659. [PubMed]

*Invest Ophthalmol Vis Sci*. 2000; 41: 417–421. [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53: 6776–6784. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2000; 41: 1774–1782. [PubMed]

*Prog Retin Eye Res*. 2007; 26: 688–710. [CrossRef] [PubMed]

*Prog Retin Eye Res*. 2010; 29: 249–271. [CrossRef] [PubMed]

*Arch Ophthalmol*. 2000; 120: 714–720, 829–830. [CrossRef]

*Ophthalmology*. 2000; 107: 1287–1293. [CrossRef] [PubMed]

*Br J Ophthalmol*. 1998; 82: 862–870. [CrossRef] [PubMed]

*Am J Ophthalmol*. 2001; 131: 699–708. [CrossRef] [PubMed]

*Arch Ophthalmol*. 1995; 113: 918–924. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2005; 46: 3712–3717. [CrossRef] [PubMed]

*J Glaucoma*. 2005; 14: 11–18. [CrossRef] [PubMed]

*Ophthalmology*. 2008; 115: 227–232. [CrossRef] [PubMed]

*Automated Static Perimetry*. 2nd ed. St. Louis: Mosby; 1999.

*Acta Ophthalmol Scand*. 1997; 75: 368–375. [CrossRef] [PubMed]

*Ophthalmol*. 2012; 119: 458–467. [CrossRef]

*Invest Ophthalmol Vis Sci*. 2012; 53: 224–227. [CrossRef] [PubMed]

*R: A Language and Environment for Statistical Computing*. Vienna: R Foundation for Statistical Computing; 2013. Available at: http://www.R-project.org/. Accessed July 2, 2013.

*Mixed-Effects Models in S and S-plus*. New York: Springer-Verlag; 2000.

*Am J Ophthalmol*. 1990; 109: 109–111. [CrossRef] [PubMed]

*Optom Vis Sci*. 2011; 88: 130–139. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53: 9211–9240. [CrossRef]

*Invest Ophthalmol Vis Sci*. 2011; 52: 9539–9540. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2011; 52: 4765–4773. [CrossRef] [PubMed]

*Optom Vis Sci*. 2008; 85: 143–148. [CrossRef]

*t*), Where

*α*,

*β*, and

*γ*are the intercept, slope, and age effect, respectively. This most simple model implicitly assumes that all eyes have the same sensitivity at time

*t*= 0 (i.e., study entry) and that all decline at the same rate

*β*after taking into account an aging effect. While this is clearly not a realistic model given intereye and intersubject variability in both physiology and pathophysiology, the following two-level (subject and eye within subject) LME models were formed: where

*i*and

*j*are indices representing the subject identification and the eye (nested within subject), respectively. The

*β*

_{21},

*β*

_{22}are the slope and age effect, respectively;

*b*

_{i}and

*b*

_{ij}are first-level (subject) and second-level (eye within subject) random effects with corresponding matrices

*Z*

_{i,j}and

*Z*

_{ij}, respectively; and ε

_{ij}values are within-group errors. The first level of random effect accounts for consistent differences in the MD between subjects. The second level of random effect accounts for consistent differences between the two eyes of a subject. The random effect is effectively an adjustment for the fact that some subjects will consistently have higher MD values than others, and it takes the form of a random variable, where there is one value for each subject.

_{1}, wherein the correlation between two residuals derived from the same eye decreases with the length of time between them. By contrast, errors are assumed to be uncorrelated within the same eye in model I.

*λ*

_{0},

*κ*

_{0},

*γ*

_{0},) are fixed-effects population parameters;

*b*

_{i}= (

*λ*

_{i}, (

*κ*

_{1})

_{i}, (

*γ*

_{1})

_{i}) and

*b*

_{ij}= (

*λ*

_{ij}, (

*κ*

_{2})

_{ij}, (

*γ*

_{2})

_{ij}) are level-one and level-two random effects, respectively; and

*ε*

_{ij}values are within-group errors. Similar to LME model II, the errors of model IV are assumed to be temporally correlated according to a continuous autoregressive (CAR1) model with covariance matrix Σ

_{2}(i.e., the correlation decreases exponentially with the length of time between the measurements), whereas errors are assumed to be uncorrelated for model III. All random effects (

*b*

_{i},

*b*

_{ij},

*ε*

_{ij}) were assumed to be independent. In addition,

*b*

_{i}values were assumed to be independent for different subjects, and

*b*

_{ij}and

*ε*

_{ij}were assumed to be independent for different subjects and/or different eyes.