**Purpose.**:
There is no gold-standard measurement of glaucomatous structural progression against which to validate software progression algorithms. A computer model was developed and validated to simulate stable series of Heidelberg Retina Tomograph II (HRT; Heidelberg Engineering, Heidelberg, Germany) images, with realistic topographic variability, suitable for benchmarking false-positive rates of progression algorithms.

**Methods.**:
Three confocal image stacks were selected from each of five sets of HRT II scans, obtained within 6 weeks in 127 eyes of 66 patients. For each eye, a simulated series was propagated from one baseline confocal stack by adding fixational eye movements, photon-counting, and electronic measurement noise. Simulated confocal stacks were imported into the HRT software to generate topography images. Real and simulated image comparisons were quantified with the mean pixel height standard deviation (MPHSD), image cross-correlation (CC) of pixel-wise variability maps, and the rim area (RA) coefficient of variation (CV).

**Results.**:
The mean difference (95% limits of agreement; LoA) in MPHSD between real and simulated images was 3.5 μm (−20.9 to 28.8 μm) within mean topographies and 2.0 μm (−5.4 to 9.3 μm) between mean topographies. The mean CC between real and simulated spatial variability maps was 0.58 within mean topographies and 0.54 between mean topographies. The mean difference (95% LoA) between real and simulated mean topography RA CV was −2.1% (−17.6% to +13.4%). Variability about anatomic features was well reproduced.

**Conclusions.**:
Simulation realistically reproduces variability in real, stable images acquired over a short period. Stability in clinical datasets is uncertain, whereas in these modeled series, it is certain. This method provides benchmark datasets on which the specificity of progression algorithms can be tested.

^{ 1 }and is widely used in the assessment of structural damage in the glaucomatous optic disc. The principal goals of imaging with application to glaucoma are to assist the user in discriminating between normal and glaucomatous optic discs and to identify and track disease progression.

^{ 2,3 }the real promise of CSLT technology is in detection of change in disc structure over time, offering the clinician another tool for glaucoma management. The repeatability of HRT measurements has been quantified and used to derive limits beyond which change cannot be accounted for by measurement variability.

^{ 4 }Age-related change has been quantified to help differentiate age effects from disease progression.

^{ 5 }Statistical techniques have been developed to detect progression based on population

^{ 6,7 }or individual subject

^{ 8 –10 }variability. However, there is little evidence to suggest that one method of detecting change is better than another, and the clinician is left to wonder what method of detecting change is best to use.

^{ 11 –14 }VF changes, when used as a reference standard in this way, can confound results by dissociating factors between structural and functional changes.

^{ 15 }As a consequence, the relative proportions of associated and independent behavior are not well known, and the temporal sequence of structural and functional glaucomatous change cannot be well-defined.

^{ 16 }Expert, but subjective, clinician assessment of ONH change using the HRT software progression reports has poor agreement with assessment of series of stereoscopic optic disc photographs.

^{ 17 }Depending on the progression criteria applied and the expert observer, only moderate concordance between the HRT Topographic Change Analysis

^{ 9 }(TCA) and observed change in monoscopic (81%

^{ 18 }and 44%–71%

^{ 19 }) and stereoscopic (65%

^{ 20 }) optic disc photographs has been reported. It has also been shown that using stereophotograph-assessed change as the reference standard does not help to determine which HRT progression-detection algorithm, of a variety of methods, best identifies glaucomatous change.

^{ 21 }It may be that HRT and stereophotograph assessment are identifying different aspects of structural progression.

^{ 22 –24 }illustrates the shortcomings of stereophotography as a reference standard. For this reason, the concept of a simulation mimicking stable serial images is a hugely useful one. Progression algorithms can be applied to the simulated stable series and criteria selected to obtain low false-positive rates and the false-positive rates of different progression algorithms can be compared with each other, knowing that the image series are truly stable. Realistic and sophisticated simulation is common in other areas of medical imaging—in particular, neuroimaging.

^{ 25,26 }In these fields, it is widely considered the best way to validate automated postprocessing algorithms, such as assessing the performance of the alignment of nonrigid images and volumes over time.

^{ 27,28 }However, with regard to the detection of glaucomatous progression, the practice of realistically simulating longitudinal image series has been neglected, with some exceptions, in scanning laser polarimetry (SLP)

^{ 29 }and elementary work on the HRT (O'Leary N, et al.

*IOVS*2009;50:ARVO E-Abstract 2249),

^{ 8 }and the predominate focus has been on simulation of VF series.

^{ 30 –34 }

^{ 35 }In summary, they consisted of the images of both eyes of 74 patients (43 ocular hypertensive [OHT], 31 primary open-angle glaucoma [POAG]) who underwent five HRT II examinations within 6 weeks. OHT was defined as intraocular pressure (IOP) more than 21 mm Hg on two or more occasions and a baseline VF test (Humphrey 24-2 full-threshold; Carl Zeiss Meditec, Inc., Dublin, CA) with an Advanced Glaucoma Intervention Study (AGIS) score of 0. POAG was defined as a consistent AGIS VF score greater than 0, and a pretreatment IOP greater than 21 mm Hg on two or more occasions. All HRT II data were processed in the latest HRT III software viewer version (ONH Viewer Module 3.1.2.15; Heidelberg Engineering). All subjects had previous experience with scanning laser tomography. This study adhered to the tenets of the Declaration of Helsinki and had local ethics committee approval and the subjects' informed consent.

*scan triplet*is used to refer to the three single image stacks (referred to as

*single scans*) obtained during each acquisition or

*examination*. The patient is repositioned at the HRT between image acquisitions. It is from this scan triplet that three

*single topographies*are derived; these are aligned and averaged to form the

*mean topography*.

^{ 36 }with a frequency power spectrum from 0 to 150 Hz.

^{ 37 }Ocular drift is assumed to have linear speed in an amplitude range of 3 to 12 arcsec, with durations spanning the intervals between microsaccades.

^{ 38 }Microsaccades are assumed to have a mean amplitude of 30 arcmin and a mean period of 30 ms.

^{ 39 }Mean speeds of microsaccades are calculated from the main sequence relationship

^{ 40 }and are parametrically related to the amplitudes of microsaccades. Each microsaccade speed is nonlinear and is approximated to that of a half-Gaussian function. The mean frequency of microsaccades is assumed to be 1.5 per second.

^{ 41 }

^{ 39 }No initial directional preference was chosen for eye movements, although results have shown that horizontal and vertical eye movements display different characteristics.

^{ 42 }

*x*,

*y*,

*t*; the path of the retina in transverse axes over time) is derived for the position of the retina. The sampling period of each optical section is approximately 24 ms (frequency, 42 Hz), and within each section, the rastered horizontal line-sampling period is approximately 62 μs (frequency, 16 kHz). As the latter frequency is many orders of magnitude above maximum frequencies recorded for eye movements, the horizontal line scan time was considered a sufficient sampling frequency for our eye movement model (i.e., movement occurs between line scans and between optical sections). The transverse (

*x*and

*y*) spatial sampling for each HRT pixel is approximately 2 arcmin in an emmetropic eye.

^{ 43 }As a consequence, only translations along the

*z*-axis (in depth) of scanning are included in the head movement model. This model is implemented by applying cumulative, random, axial translation (with appropriate scaling) and rotation to each sequentially scanned optical section throughout the scan triplet.

^{−6}seconds, the effect of optical blur or smearing of the optical point spread function (PSF) due to eye movements is considered negligible for this model. Even at the maximum simulated eye speed, the maximum displacement during this dwell time would be <0.00025 of the transverse spatial sampling interval between pixels (one pixel width). One pixel width is approximately equal to a typical full width at half maximum of the PSF in the HRT scanning system, as referred to in the imaging system's lateral resolution specification.

^{ 44 }

^{ 45 }in the form of quantum or photon-counting (shot) noise,

^{ 46 }along with electronic noise, which is primarily thermal noise (Johnson/Nyquist) and APD noise.

^{ 47 }Photon-counting noise can be modeled as a Poisson process, where the intensity measured at the detector (proportional to number of photons striking the detector) for a sampled point can be chosen from a Poisson distribution with a mean intensity of λ. In this simulation, λ is given by the intensity of the pixel of the seed optical section after eye and head movements have been added. Thermal and APD noise which are intrinsic noise from the properties of electronic circuitry in general and semiconductor diodes, respectively, can be combined and approximated to white noise with a given amplitude. Each pixel in each optical section has added spatially independent noise sampled from a normal distribution with a mean of 0 and with the amplitude of the noise determined by the SD of the distribution. In this simulation, the SD is estimated at 2% of the maximum 256 stored intensity gray-level values in an 8-bit analog-to-digital converter. After noise is introduced, each pixel intensity value is rounded to the nearest 8-bit integer, to represent the digitization of the signal in the HRT optical section.

^{ 48 }

_{w}). Each pixel of this map, PHSD

_{w}(

*x*,

*y*), is the sample SD at that pixel across the three aligned single topographies comprising a mean topography.

_{b}). Each pixel of this map, PHSD

_{b}(

*x*,

*y*), is the sample SD at that pixel across all mean topographies in the series.

_{w}). It is used as an overall measure of within-examination quality and is provided in the HRT III software. It is the geometric mean of all pixels in the PHSD

_{w}map of a mean topography.

_{b}). It was used in the study as a global measure of the variability between mean topographies in a series. It is the geometric mean of all pixels in the PHSD

_{b}map for each series.

^{ 49 }It is the least variable of the stereometric HRT parameters and has been well characterized.

^{ 4,50 }RA is thus a useful indicator for detecting glaucomatous progression.

^{ 51,52 }In this study, the Moorfields fixed, standard reference plane (RP) was used to calculate RA.

^{ 53 }

_{w}, MPHSD

_{b}, and RA CV) were compared by plotting the paired values (or the mean when appropriate) of simulated and real series and examining the mean difference. Mean differences (bias) and the 95% LoA between real and simulated series were calculated for these metrics. Analysis of proportional bias and agreement was also performed.

^{ 54 }was used to compare the local variability measures (PHSD

_{w}and PHSD

_{b}maps) of real and simulated topographies. The term maximum is used, as PHSD

_{w}and PHSD

_{b}maps can be shifted along the

*x*and

*y*coordinates (with an upper limit of 20 pixels) to find the maximum NCC. This approach is logical, as the real and simulated series will have different baselines, and, as such, PHSD maps must be approximately aligned to each other. This metric captures the extent of mutual spatial information in the real and simulated variability maps and thus how well the location of topographic variability is reproduced by the simulation. NCC ranges from a minimum possible value of −1, representing perfectly anticorrelated image data, to a value of +1, representing perfectly correlated image data, where a value of 0 represents no correlation whatsoever.

_{w}variability maps were benchmarked against NCC values from comparisons of the real mean topography PHSD

_{w}variability maps with each other. The NCC for a real–simulated series comparison was the mean of the 25 possible pairs of comparisons. The NCC for the real–real series comparison was the mean of 10 possible pairs of comparisons.

_{w}, MPHSD

_{b}, RA CV, and NCCs for PHSD

_{w}maps are displayed in Table 1. The results of analysis of proportional effects on agreement between real and simulated series are also included in Table 1, with statistical significance of linear relationships between biases and average measurements and statistical significance of linear relationships between 95% LoA and average measurements. In addition, a log transformation was used on the mean values and before differences were calculated between real and simulated measures—an approach designed to ensure robustness against the effect of heteroscedasticity and potential outliers on any measured proportional effects.

^{ 55 }These characteristics are clear in the Bland-Altman plots of Figures 2 and 5. Accordingly, log-transformed significance of proportionally dependent bias and 95% LoA are included in Table 1.

Measures of Variability | Measures of Difference | |||||
---|---|---|---|---|---|---|

Uniform | Significance of Proportional Dependence | |||||

Linear | Log-Transformed | |||||

Bias | 95% LoA | Bias | 95% LoA | Bias | 95% LoA | |

MPHSD_{w} | 3.5 μm | −20.9 to 28.8 μm | P = 0.10 | P < 0.01 | P = 0.61 | P < 0.01 |

MPHSD_{b} | 2.0 μm | −5.4 to 9.3 μm | P < 0.01 | P < 0.01 | P = 0.35 | P = 0.27 |

RA CV | −2.1% | −17.6 to −13.4% | P = 0.09 | P = 0.08 | P = 0.40 | P = 0.07 |

NCC PHSD_{w} | 0.052 | 0.039 to 0.065 | P < 0.01 | P = 0.25 | P < 0.01 | P = 0.87 |

_{w}and MPHSD

_{b}, respectively. Agreement between paired real and simulated topography series for RA CV values for global RA is shown in the Bland-Altman plot in Figure 3. Linear proportional biases and 95% LoA found to have statistical significance in either linear or log-transformed domains (

*P*< 0.05) are displayed on all Bland-Altman plots.

_{w}distribution between simulated and real data is shown in Figure 4a, with a mean of 0.58 (SD 0.12). The distribution of series-wise, maximum NCC of PHSD

_{b}maps between simulated and real data is shown in Figure 4b, with a mean of 0.54 (SD 0.10). The distribution of average, series-wise, maximum NCC between real PHSD

_{w}maps and other real PHSD

_{w}maps in the same series is shown in Figure 4c, with a mean of 0.64 (SD 0.14).

_{w}maps against the average NCC of real–simulated PHSD

_{w}maps is shown by the Bland-Altman plot in Figure 5. The mean difference was 0.052 and the 95% LoA were 0.039 and 0.065.

_{w}maps for real and simulated data are shown in Figures 6 and 7. Examples of series average mean reflectance images, series average mean topographies, and corresponding PHSD

_{b}maps for real and simulated data are shown in Figures 8 and 9.

^{ 56,57 }and changes in the backward and forward movements of the retinal surface in vessel-free areas.

^{ 58 }Long- and short-term IOP changes can also cause changes in the vessel caliber

^{ 59 }along with changes in optic disc cupping

^{ 60 }and size.

^{ 61 }The simulation does not account for these effects, but it may be possible to do so using appropriate parameters. Despite these potential shortcomings, the simulations appear to realistically reproduce observed topographic variability.

^{ 29 }SLP differs from CSLO, in that it measures relevant anatomy in a different manner and has different sources of variability. Another simulation methodology, specific to HRT data, has used small misalignments of identical finalized or postprocessed single topographies to generate longitudinal series.

^{ 8 }Although this approach reproduces some of the global variability inherent in no-change patient HRT data, it reproduces very little of the spatial and feature dependent variability (O'Leary N, et al.

*IOVS*2009;50:ARVO E-Abstract 2249). It is also unsatisfactory from a theoretical point of view, as it is difficult to estimate the relationship of acquisition noise factors and the variability of postprocessed topographies. In contrast, all the three-dimensional information acquired in the HRT scan is used the methodology presented in this study. Further, a key component of the simulation described in the present study is that the simulated scans are reimported into the HRT III software. Thus, any misalignments added both within examination and between examinations are subject to partial correction by the HRT III alignment algorithms, consistent with the procedure for real patient data.

^{ 48 }

*P*= 0.05) between scans resulting in the loss of more than 20 pixels at the top or bottom of a mean topography will have a small effect on the alignment algorithms of the HRT and the calculation of variability values.

*IOVS*2009;50:ARVO:E-Abstract 319), allowing the modeling of other processes that occur before the summation of light at the detector for each pixel by modifying the PSF during and between acquisition sessions. These include changes in pupil diameter, in media opacities, and in uncorrected lower order aberrations (defocus and astigmatism). This simulation provides a model that can be used to test progression algorithm specificity appropriate to the baseline image quality, but cannot mimic systematically changing image quality over time. The long-term changes in the optical quality of the eye over time (increased scatter and absorption) resulting in a reduced and noisier light signal reflected to the device detector should be addressed, to mimic this feature in some topographic series over time.