purpose. To investigate the optimal frequency of imaging during follow-up to detect glaucoma progression by characterizing variability (noise) in neuroretinal rim area (RA) measured by Heidelberg Retina Tomograph (HRT; Heidelberg Engineering, Heidelberg, Germany).

methods. RA noise was estimated from patient data and characterized by fitting theoretical distributions to the observed data. Multilevel regression was used to determine factors that significantly affect noise. Computer simulations of disease progression were performed by adding noise generated from the distribution derived from the observed data to the average rate of loss in RA estimated from longitudinal data. Rates of detection of disease progression were investigated for various progression rates, follow-up periods, and rates of imaging.

results. Noise was not normally distributed and was best characterized by the hyperbolic distribution, which fit averages well while allowing for extreme values. Noise was greatly influenced by image quality, but age did not have a significant effect. Rates of detection improved for more frequent imaging, better quality images, and faster rates of disease progression.

conclusions. Noise in HRT measurement of RA is well characterized by the hyperbolic distribution. Sensitivity of detection improves with more frequent testing, but if consistently poor-quality images are yielded for a patient, the probability of detection is low. Results from this work could be used to tailor individual follow-up patterns for patients with different rates of RA loss and image quality, especially in a clinical trial setting.

^{ 1 }However, one real promise of the HRT lies in the reliable detection of disease progression in glaucoma. One approach to detecting progression is to quantify changes in the morphologic features of the optic disc, typically expressed as stereometric parameters. Neuroretinal rim area (RA) is a reliable measure because it exhibits less test-retest variability than other stereometric parameters and has been shown to be reproducible when, for example, different observers acquire the image.

^{ 2 }

^{ 3 }RA has been shown to give good separation between glaucomatous and normal eyes in test samples.

^{ 4 }Given the relative precision of RA measurements and its straightforward interpretation, we consider RA a useful indicator for detecting glaucomatous progression. Furthermore, RA is clinically meaningful because the loss of RA tissue parallels the loss of retinal ganglion cell axons typical of glaucomatous damage.

^{ 5 }

^{ 6 }

^{ 7 }

^{ 8 }

^{ 9 }), machine characteristics (e.g., image alignment

^{ 10 }), and operator characteristics (e.g., different operators,

^{ 11 }placement of the contour line outlining the optic disc margin

^{ 12 }

^{ 13 }).

^{ 14 }

^{ 15 }POAG was defined as pretreatment IOP greater than 21 mm Hg on two or more occasions and AGIS VF score consistently greater than 0. This study is described in detail elsewhere, including detailed definitions of OHT and POAG.

^{ 2 }

^{ 10 }Longitudinal data came from a trial of betaxolol against placebo in patients with OHT.

^{ 16 }The data here are from 216 of those patients, for whom HRT images were performed at 4 to 16 visits (median, 10 visits) over 2 to 7 years (median, 6 years). In the course of follow-up, early glaucomatous field loss developed in 44 patients with OHT. This conversion to early glaucoma was defined on the basis of VF change by AGIS criteria.

^{ 15 }

^{ 16 }

^{ 8 }

^{ 11 }

^{ 17 }Briefly, the height of the standard reference plane can vary depending on the height of the contour line at the temporal optic nerve head margin, whereas the 320-μm reference plane is fixed and thus yields less variable morphometric data. Studies were performed in accordance with the tenets of the Declaration of Helsinki, informed consent was obtained from the participants, and the research was approved by the appropriate ethics committee.

- Cross-sectional data: For each patient, the mean of the five values of RA was calculated as the best available estimate of the true RA. This mean was subtracted from each of the five individual RA measurements to give five deviations from the mean for each patient, giving a total of 370 deviations. These deviations represented our best estimate of cross-sectional noise.
- Longitudinal data: Linear regression of RA over time was fitted to each patient’s individual image series. This effectively removed the changes over time, giving an estimate of the mean RA at each time point. Residuals from the regression model (i.e., differences between each observed point and fitted point) were taken as estimates of noise.

^{ 18 }This distribution belongs to the family of “stable ” distributions, where stable refers to the property of distributions that retain shape when added together. These distributions generalize the normal distribution. They are more “peaked,” more observations fall directly on the average than are seen in a normal distribution, and tails are heavier than are seen in the normal distribution. These distributions are used widely in financial mathematics for modeling stable random variables with extreme values that occur more frequently than in the normal distribution. We hypothesized that the hyperbolic model would mimic the clinical observation of HRT measurements, in which most values are highly reproducible but in which noise sometimes increases dramatically because of image acquisition or processing difficulties. In contrast to the normal distribution, the hyperbolic distribution has four parameters: location, scale, peak, and symmetry. These parameters may be manipulated to give a family of distributions to fit data according to patient characteristics. The goodness-of-fit of these distributions was assessed by the Kolmogorov-Smirnov statistic (for which the null hypothesis states that the distribution fits the data). The distribution that best described the test-retest noise was then validated by assessing its goodness-of-fit to the noise in the longitudinal data. This analysis was repeated for measurements of RA within the six predefined sectors of the optic disc: temporal, temporal superior, temporal inferior, nasal, nasal superior, and nasal inferior.

^{ 19 }CND is a measure of the density in the center of the lens nucleus that gives an objective assessment of the degree of nuclear opacification. Image quality was assessed by the SD of the topographic images, each of which comprises the mean of three single images. This SD is known as topographic SD or mean pixel height SD (MPHSD) and is the HRT manufacturer’s index for image quality.

^{ 20 }

^{ 21 }MLM is similar to ordinary multiple linear regression in that a model between a number of predictor variables and a single outcome variable may be developed and approximated by a straight line. Ordinary linear regression makes the assumption that all outcome observations are independent of each other, but in the cross-sectional data each patient contributes five deviations so that deviations are nested within patients and are thus not independent. A deviation-level analysis ignoring this clustering may result in the underestimation of the standard errors of regression coefficients, giving overly small

*P*values, whereas a patient-level analysis (e.g., using average deviations) loses potentially valuable information.

^{ 22 }MLM adjusts for the hierarchical structure of the data, allowing for the correlation between deviations for each patient and explicitly modeling the way in which deviations are grouped within patients. Essentially, in MLM, patients are regarded as a (random) sample from the population of all patients, and inference is made about the variation between patients in general. Intercepts and slopes of the fitted regression lines can vary randomly between patients. Multilevel modeling was carried out with the use of a software package (MLwiN, version 2.01; Multilevel Models Project, Institute of Education, London, UK).

^{ 23 }

*n*= 44) and taking the average of these regression slopes. Conversion to glaucoma was defined on the basis of VF change by AGIS criteria.

^{ 15 }

^{ 16 }One thousand “virtual ” patients were simulated to have this rate of progression, to which noise generated from the distribution of noise observed in the test-retest data was added. Sensitivity and specificity of RA linear regression to disease progression, defined as the average slope, were calculated for a range of frequencies of imaging and lengths of follow-up. A test outcome positive for progression was defined as a negative regression slope of RA over time, with

*P*< 0.05. Computer simulations were performed in the statistical programming language R, version 2.0.1 (The R Foundation for Statistical Computing, Vienna, Austria),

^{ 24 }and the R package HyperbolicDist

^{ 25 }was used to model the hyperbolic distribution.

^{2}, SD 0.048 mm

^{2}. The distribution of this observed noise was highly peaked, with long tails, and was poorly fitted by the normal distribution (

*P*= 0.02; Kolmogorov-Smirnov test). This suggests that the RA measurements were usually precise but that there were also many very poor measurements. The hyperbolic distribution gave a better fit to the observed noise than the normal distribution (

*P*= 0.60; Kolmogorov-Smirnov test). The fits of the normal and hyperbolic distributions to the cross-sectional noise are shown in Figure 1 . Quantile plots show that the observed data points in the tails of the distribution are farther from the center than would be expected for either the normal or the hyperbolic distribution. This lack of tail fit is less pronounced for the hyperbolic distribution.

*P*< 0.001; Mauchly test of sphericity). The noise was more spread in the temporal and nasal sectors and least spread in the nasal superior sector.

*P*< 0.001 in all sectors for normal distribution; Kolmogorov-Smirnov test).

^{2}(95% confidence interval [CI]: 0.0003–0.0007 mm

^{2}). Lens opacity, as measured by CND value, had a moderately strong effect. A unit increase in CND increased noise by 0.002 mm

^{2}(95% CI: 0.0005–0.0040 mm

^{2}). Age was not statistically significant in the multiple regression model.

^{2}(SD 0.038 mm

^{2}). As was seen in the cross-sectional data, the distribution of the observed noise was highly peaked. The hyperbolic distribution and its estimated parameters, which provided the best fit to the cross-sectional noise, was then fitted to the observed noise in the longitudinal data for all cases and separately for three categories of MPHSD: good (≤30 μm), acceptable (31–50 μm), and unacceptable (>50 μm). These categories of MPHSD reflect the categories given in the HRT literature.

^{ 20 }

*P*values obtained with the use of the Kolmogorov-Smirnov test showed that the hyperbolic distribution provided a good fit for acceptable and good values of MPHSD. However, neither the hyperbolic nor the normal distribution fitted the observed data well for unacceptable levels of MPHSD.

^{2}(interquartile range, 0.021 mm

^{2}). This represents a loss of approximately 0.75% of an average normal RA per year (where average normal RA is approximately 1.6 mm

^{2}).

^{ 4 }

^{2}per year, a loss of approximately 1.5% of an average normal RA per year). Simulations were repeated for the three categories of MPHSD: good (≤30 μm) shown in row I, acceptable (31–50 μm) shown in row II, and unacceptable (>50 μm) shown in row III (Fig. 6) . As expected, these simulation experiments indicated that increasing the frequency of testing improved detection rates in patients with progressive disease at all lengths of follow-up and with all image qualities. For example, for a virtual patient with an average rate of RA loss and good-quality images, imaging once a year for 4 years (Fig. 6BI)resulted in a detection rate of 37%, whereas imaging four times a year for 4 years gave a more acceptable 78% detection rate. Of course, detection rates are better still in eyes with disease that progresses faster (Fig. 6CI) . For the upper quartile of loss in RA, over a follow-up period of 4 years, imaging once a year will detect 71% and imaging 4 times a year will detect 98% of patients with progressive disease. Detection rates also improve as image quality improves. For example, imaging twice a year over a 5-year follow-up period will detect 98% of fast progressing disease with good-quality images (Fig. 6CI) , 89% of patients with acceptable quality images (Fig. 6CII) , and 56% of patients with unacceptable quality images (Fig. 6CIII) . Column A of Figure 6shows the percentage of virtual patients with nonprogressing disease incorrectly identified as progressing, giving an indication of the specificity of detection. Specificity deteriorates over time, more steeply for more frequent testing.

^{ 26 }and the measurement of RA has already been established as a reliable tool for separating glaucomatous eyes from normal eyes.

^{ 27 }

^{ 28 }

^{ 29 }Any true deterioration in RA will only be identified as such if it can be distinguished from variability, or noise, in RA measurements. It is clear that the observed noise in the cross-sectional and longitudinal data sets was not normally distributed. This is a significant finding because most statistical analyses in medical applications make an assumption of normality, which in this case would be inappropriate. The normal distribution is used frequently because of its central importance to sampling theory.

^{ 30 }The noise in the cross-sectional data and the longitudinal data was very much more peaked and had longer tails than in the normal distribution, indicating that scans of most patients are reliable (with values of noise close to 0), but a small number of scans give rise to extreme values of noise. This pattern of observed noise in the cross-sectional data was shown to be well approximated by the hyperbolic distribution. Recently this distribution, originally developed by geomorphologists in the 1940s,

^{ 31 }has been widely used in economics because it gives a better fit to certain types of financial data than the normal distribution.

^{ 32 }The hyperbolic distribution has higher peaks and longer tails than the normal distribution and provides a good model for averages while also interpreting exceptional behavior. The fit of this distribution was confirmed on the noise in the longitudinal data, a more realistic data set in terms of what is available to the clinician determining whether progression has occurred.

^{ 33 }Therefore, it is particularly important that any reduction in RA in these areas be reliably detected. We found the noise in these areas to be relatively small compared with the noise in the temporal and nasal sectors, which showed the greatest spread. Any RA changes occurring in these latter two sectors would therefore have to be of larger magnitude to be reliably detected. The results described in this study provide a foundation for developing a technique for detecting progression in sectoral RA. The differences in noise distribution in the different disc sectors cannot be explained by a relationship between RA and variability, whereby RA in more damaged discs is noisier than RA in discs with early damage. No relationship has been established between RA and variability,

^{ 2 }and no statistically significant differences have been found in the test-retest variability of HRT II stereometric parameters between glaucomatous and normal eyes.

^{ 34 }

^{ 15 }The patterns of VF change in glaucomatous progression are well documented, but given the lack of any criterion for progression and the measurement error inherent in perimetric assessment, these rates of change are necessarily approximations of any true underlying change.

^{ 14 }

^{ 35 }

^{ 36 }and, more recently, statistical image mapping (SIM).

^{ 37 }These methods detect change at the pixel level (or group of pixels in TCA) rather than with summary measures such as RA. Change is evaluated within each patient, thus obviating the need for average measures of change and variability. The noise characteristics in RA measurements may also be apparent in these analyses; this is the subject of future work. The development of methods that make use of stereometric parameters such as RA still have a role in determining progression; they are clinically familiar, and change in an area is easier to grasp than change in topographic height. Summary measures of disc changes are useful for describing disease progression in large samples of patients in clinical trials. It is likely that the complete analysis of longitudinal HRT data may be best served by an amalgam of global analysis of changes in the optic nerve head coupled with techniques that can help the clinician visualize the localized areas of likely change.

**Figure 1.**

**Figure 1.**

**Figure 2.**

**Figure 2.**

**Figure 3.**

**Figure 3.**

Parameter | Estimate | Standard Error | P | 95% CI |
---|---|---|---|---|

Intercept | 0.0111 | — | — | — |

MPHSD (μm) | 0.0006 | 0.0001 | <0.001 | (0.0004, 0.0008) |

**Figure 4.**

**Figure 4.**

**Figure 5.**

**Figure 5.**

**Figure 6.**

**Figure 6.**

*(computer program). Versions 1.09–2.01*. 1993–1999;Heidelberg Engineering Heidelberg, Germany.

*(computer program). Version 0.0–1*. 2003;David Scott Auckland, New Zealand.