Neuroretinal rim area, as measured by the HRT, is an effective, objective quantifiable indicator to determine whether a patient with glaucoma has stable or worsening disease. RA is often the first area to show glaucomatous changes,
26 and the measurement of RA has already been established as a reliable tool for separating glaucomatous eyes from normal eyes.
27 28 29 Any true deterioration in RA will only be identified as such if it can be distinguished from variability, or noise, in RA measurements. It is clear that the observed noise in the cross-sectional and longitudinal data sets was not normally distributed. This is a significant finding because most statistical analyses in medical applications make an assumption of normality, which in this case would be inappropriate. The normal distribution is used frequently because of its central importance to sampling theory.
30 The noise in the cross-sectional data and the longitudinal data was very much more peaked and had longer tails than in the normal distribution, indicating that scans of most patients are reliable (with values of noise close to 0), but a small number of scans give rise to extreme values of noise. This pattern of observed noise in the cross-sectional data was shown to be well approximated by the hyperbolic distribution. Recently this distribution, originally developed by geomorphologists in the 1940s,
31 has been widely used in economics because it gives a better fit to certain types of financial data than the normal distribution.
32 The hyperbolic distribution has higher peaks and longer tails than the normal distribution and provides a good model for averages while also interpreting exceptional behavior. The fit of this distribution was confirmed on the noise in the longitudinal data, a more realistic data set in terms of what is available to the clinician determining whether progression has occurred.
When the cross-sectional noise was separated into the six segments of the optic disc, significant differences were clear in the spread of noise across the sectors. It has been suggested that early glaucomatous changes often result in narrowing of RA in the inferior and superior temporal sectors.
33 Therefore, it is particularly important that any reduction in RA in these areas be reliably detected. We found the noise in these areas to be relatively small compared with the noise in the temporal and nasal sectors, which showed the greatest spread. Any RA changes occurring in these latter two sectors would therefore have to be of larger magnitude to be reliably detected. The results described in this study provide a foundation for developing a technique for detecting progression in sectoral RA. The differences in noise distribution in the different disc sectors cannot be explained by a relationship between RA and variability, whereby RA in more damaged discs is noisier than RA in discs with early damage. No relationship has been established between RA and variability,
2 and no statistically significant differences have been found in the test-retest variability of HRT II stereometric parameters between glaucomatous and normal eyes.
34
Modeling the relationship between noise and possible predictive patient or scan factors allows us to understand which patients are likely to have reliable (low noise) scans. Because the nature of the test-retest data—i.e., more than one measurement of RA per patient—violates the independence assumption of ordinary linear regression, we used multilevel techniques to account for this clustering in the data. This nonindependence is also true of the longitudinal data because patients underwent repeated imaging over time. MLM may also be used to model this sort of data structure. MLM is particularly appealing because the interpretation of the parameter estimates is similar to that of estimates arising from ordinary linear regression. Results from this analysis suggest MPHSD to be the factor with an overriding effect on noise to the exclusion of most other patient factors, including age. Thus one important clinical finding from this study is that useful scans can be obtained during follow-up of older patients with glaucoma, a significant finding given the high prevalence of glaucoma in the elderly population. In fact, taking MPHSD into account when interpreting changes in the RA of patients with POAG would remove much of the uncertainty in deciding how frequently and over how long a follow-up period to image. Noise was found to be less spread in images of better quality, enabling true change to be more easily distinguished from noise and requiring less frequent imaging for the reliable detection of disease progression in patients with high-quality images. This important finding should be incorporated into planned methods for detecting change in RA over time. Our measure of lens opacity, CND, had a statistically significant effect on noise independently of MPHSD; however, CND is primarily used as a research tool and is not readily available in the clinic.
Our computer simulation experiments of frequency of testing indicated that, in general, the sensitivity of disease progression increased with more frequent testing, for testing over a longer follow-up period, and for better quality images. For example, if we consider a virtual patient progressing at an average rate, imaging twice a year over 4 years gives a sensitivity of 42% for good quality images and 29% for acceptable images. However, sensitivities of 61% and 36% are achieved by imaging four times a year over 4 years for good and acceptable quality images, respectively. Of course, faster rates of loss are detected with better precision; for a virtual patient whose disease is progressing at the upper quartile of loss, imaging twice a year over four years would give sensitivities of 86% and 64% for good and acceptable quality images, respectively, and imaging four times a year over 4 years would result in detection rates of 95% and 82%. In these analyses, the mean (in the cross-sectional sample) and regression line (of longitudinal sample) are only estimates of true RA and might have been biased, indicating that the deviances (or residuals) could have underestimated the true noise. We must also emphasize that the simulation experiments simply demonstrate how ordinary linear regression performs in the presence of measurement noise sampled from a hyperbolic distribution; of course, the process of fitting trend lines by the method of least squares assumes that the errors (more precisely, residuals from the fit) are normally distributed. Alternative methods for fitting a trend to a series of observations, in which the process considers these attributes of the data, may provide more accurate estimates of rates of loss but are the subject of future work. It is hoped that these might improve the diagnostic precision we report from the current computer experiments.
One important caveat regarding the assessment for progression at each point in time during repeated sequential imaging is that it results in deteriorating specificity analogous to an inflated type I error brought about by repeated statistical hypothesis testing. Corrective statistical methods are required to maintain an acceptable level of specificity throughout follow-up, and any method for detecting change should incorporate solutions for this. Additionally, further modifications may be carried out to reflect the relative importance of tests conducted over a fixed observation period, such as the duration of a clinical trial.
As is customary in statistical methods, the computer simulations were based on average rates of RA loss, and this use of averages is often at odds with the needs of clinicians who necessarily think in terms of individual patients. However, it may be possible to tailor rates of progression and rates of imaging to individual patients. In a larger data set, patients may be divided according to their rates of loss and their values of MPHSD (the factor that determines the level of noise and thus the rate of imaging necessary to detect progression). The rates of change in RA used in our simulation experiments were based on data from patients with glaucoma that developed according to VF criteria.
15 The patterns of VF change in glaucomatous progression are well documented, but given the lack of any criterion for progression and the measurement error inherent in perimetric assessment, these rates of change are necessarily approximations of any true underlying change.
14
The value of the HRT for detecting glaucomatous progression will be realized as standards for specificity and optimal image acquisition frequencies are established. Alternative techniques for detecting glaucomatous progression in series of HRT images include topographic change analysis (TCA)
35 36 and, more recently, statistical image mapping (SIM).
37 These methods detect change at the pixel level (or group of pixels in TCA) rather than with summary measures such as RA. Change is evaluated within each patient, thus obviating the need for average measures of change and variability. The noise characteristics in RA measurements may also be apparent in these analyses; this is the subject of future work. The development of methods that make use of stereometric parameters such as RA still have a role in determining progression; they are clinically familiar, and change in an area is easier to grasp than change in topographic height. Summary measures of disc changes are useful for describing disease progression in large samples of patients in clinical trials. It is likely that the complete analysis of longitudinal HRT data may be best served by an amalgam of global analysis of changes in the optic nerve head coupled with techniques that can help the clinician visualize the localized areas of likely change.
In conclusion, we have established that the distribution of measurement error in HRT imaging of RA is best approximated by the hyperbolic distribution, thus allowing for computer simulations of progression and estimates of the sensitivity and specificity of detection of progression using RA. Issues concerning the attributes of noise may be relevant to other imaging modalities and other structural measures. Image quality is critical in terms of determining progression, and any method for detecting change must take this into account. Detection rates will improve with more frequent imaging, but techniques for correcting false-positive rates must also be applied. The results presented here will be used to develop statistical methods that will improve rates of detection and monitor change more reliably.