Abstract
Purpose.:
There is no gold-standard measurement of glaucomatous structural progression against which to validate software progression algorithms. A computer model was developed and validated to simulate stable series of Heidelberg Retina Tomograph II (HRT; Heidelberg Engineering, Heidelberg, Germany) images, with realistic topographic variability, suitable for benchmarking false-positive rates of progression algorithms.
Methods.:
Three confocal image stacks were selected from each of five sets of HRT II scans, obtained within 6 weeks in 127 eyes of 66 patients. For each eye, a simulated series was propagated from one baseline confocal stack by adding fixational eye movements, photon-counting, and electronic measurement noise. Simulated confocal stacks were imported into the HRT software to generate topography images. Real and simulated image comparisons were quantified with the mean pixel height standard deviation (MPHSD), image cross-correlation (CC) of pixel-wise variability maps, and the rim area (RA) coefficient of variation (CV).
Results.:
The mean difference (95% limits of agreement; LoA) in MPHSD between real and simulated images was 3.5 μm (−20.9 to 28.8 μm) within mean topographies and 2.0 μm (−5.4 to 9.3 μm) between mean topographies. The mean CC between real and simulated spatial variability maps was 0.58 within mean topographies and 0.54 between mean topographies. The mean difference (95% LoA) between real and simulated mean topography RA CV was −2.1% (−17.6% to +13.4%). Variability about anatomic features was well reproduced.
Conclusions.:
Simulation realistically reproduces variability in real, stable images acquired over a short period. Stability in clinical datasets is uncertain, whereas in these modeled series, it is certain. This method provides benchmark datasets on which the specificity of progression algorithms can be tested.
Confocal scanning laser tomograph (CSLT) technology, typified by the commercially available Heidelberg Retina Tomograph II (HRT, Heidelberg Engineering GmbH, Heidelberg, Germany), gives three-dimensional images of the optic disc and peripapillary retina. This established imaging modality provides reproducible results
1 and is widely used in the assessment of structural damage in the glaucomatous optic disc. The principal goals of imaging with application to glaucoma are to assist the user in discriminating between normal and glaucomatous optic discs and to identify and track disease progression.
Although the HRT II discriminates between normal discs and those with glaucoma reasonably well,
2,3 the real promise of CSLT technology is in detection of change in disc structure over time, offering the clinician another tool for glaucoma management. The repeatability of HRT measurements has been quantified and used to derive limits beyond which change cannot be accounted for by measurement variability.
4 Age-related change has been quantified to help differentiate age effects from disease progression.
5 Statistical techniques have been developed to detect progression based on population
6,7 or individual subject
8 –10 variability. However, there is little evidence to suggest that one method of detecting change is better than another, and the clinician is left to wonder what method of detecting change is best to use.
Most research assessing how well these techniques can identify and track glaucomatous progression with the HRT has focused on examining agreement between structural and functional measures of progression which In practice, involves assessing the diagnostic performance of HRT information in predicting visual field (VF) changes.
11 –14 VF changes, when used as a reference standard in this way, can confound results by dissociating factors between structural and functional changes.
15 As a consequence, the relative proportions of associated and independent behavior are not well known, and the temporal sequence of structural and functional glaucomatous change cannot be well-defined.
An alternative reference standard against which to assess the performance of CSLT imaging is change identified by glaucoma experts from optic disc or retinal nerve fiber layer (RNFL) photographs. Stereometric parameters have poor to moderate diagnostic performance as predictors for expert assessed changes in RNFL photographs.
16 Expert, but subjective, clinician assessment of ONH change using the HRT software progression reports has poor agreement with assessment of series of stereoscopic optic disc photographs.
17 Depending on the progression criteria applied and the expert observer, only moderate concordance between the HRT Topographic Change Analysis
9 (TCA) and observed change in monoscopic (81%
18 and 44%–71%
19 ) and stereoscopic (65%
20 ) optic disc photographs has been reported. It has also been shown that using stereophotograph-assessed change as the reference standard does not help to determine which HRT progression-detection algorithm, of a variety of methods, best identifies glaucomatous change.
21 It may be that HRT and stereophotograph assessment are identifying different aspects of structural progression.
With stereoscopic disc photograph or visual fields as the reference standard for progression/stability, the apparent poor discrimination between stable and progressing eyes by HRT progression algorithms may be due to either poor performance of the algorithms or inaccuracies in the reference standard itself. Poor between-expert agreement in identifying progression based on stereoscopic photographs
22 –24 illustrates the shortcomings of stereophotography as a reference standard. For this reason, the concept of a simulation mimicking stable serial images is a hugely useful one. Progression algorithms can be applied to the simulated stable series and criteria selected to obtain low false-positive rates and the false-positive rates of different progression algorithms can be compared with each other, knowing that the image series are truly stable. Realistic and sophisticated simulation is common in other areas of medical imaging—in particular, neuroimaging.
25,26 In these fields, it is widely considered the best way to validate automated postprocessing algorithms, such as assessing the performance of the alignment of nonrigid images and volumes over time.
27,28 However, with regard to the detection of glaucomatous progression, the practice of realistically simulating longitudinal image series has been neglected, with some exceptions, in scanning laser polarimetry (SLP)
29 and elementary work on the HRT (O'Leary N, et al.
IOVS 2009;50:ARVO E-Abstract 2249),
8 and the predominate focus has been on simulation of VF series.
30 –34
In this study, we developed a virtual platform in which components of the optic disc topographic image formation process of the HRT are simulated computationally, taking into account features relevant to topographic variability, such as eye movements and other sources of noise during and between HRT examination. This goal is approached by using empiric and theoretical parameters of these noise sources to propagate HRT series from real baseline scans. The resulting simulated HRT images will be analogous to the real HRT images obtained in clinics with variability similar to that inherent in real, stable series. The key advantage of this approach is having exact knowledge that the morphology of the original structure is not changing, because all images are derived from the same baseline scan. This simulation can then provide a test bed for comparing false-positive rates of HRT change detection methods.
The HRT II acquires a confocal image stack composed of optical sections taken through a sequence of focal planes 62.5 μm apart in a horizontal scanning angle of 17.5° and vertical scanning angle of 15° around the ONH surface. These are aligned to compensate for eye movements occurring during scan acquisition. Each optical section image is sampled as an array of 384 rastered horizontal line scans, each sampling 448 pixels. The intensity of light reflected in each optical section for a location on the ONH/retinal surface is used to calculate the surface height at that location (approximately the depth position of the optical section with the highest reflectance intensity at that location). The matrix of surface height values of the central (384 × 384) sampled locations forms the topography image with the information from the 64 horizontally peripheral pixels used only before topography formation.
The simulation uses the unprocessed (unaligned) confocal image stack as the basis for simulation. For purposes of simplicity and clarity, a scan triplet is used to refer to the three single image stacks (referred to as single scans) obtained during each acquisition or examination. The patient is repositioned at the HRT between image acquisitions. It is from this scan triplet that three single topographies are derived; these are aligned and averaged to form the mean topography.
One baseline single scan for each eye is chosen as the seed for the generation of the simulated series. Each optical section is aligned with the section above and below it by using a cross-correlation algorithm, to reduce eye and head movements already potentially present in the seed data. Within each seed stack, various sources of noise, present during image acquisition, that potentially result in variability between the generated topography (single and mean) images is added in the following order: (1) within-examination eye movement, (2) within-examination head movement, (3) between-examination head movement, and (4) device noise, composed of quantum effect noise (shot/photon counting) and electronic noise (Johnson-Nyquist/thermal). Simulated single topographies are formed from resultant image stacks using the HRT III software; the simulated mean topographies are calculated from the simulated single topographies. The input and sequence of the addition of these sources of noise are shown in
Figure 1.
Within-Examination Eye Movements.
In HRT image acquisition, the patient is asked to fixate on a target before and during the acquisition. Thus, the eye movement model consists of the three main components of normal fixational eye movements: ocular microtremor, ocular drift, and microsaccades. Ocular microtremor is assumed to have an amplitude range of 0 to 2 arcmin (median, 17.5 arcsec),
36 with a frequency power spectrum from 0 to 150 Hz.
37 Ocular drift is assumed to have linear speed in an amplitude range of 3 to 12 arcsec, with durations spanning the intervals between microsaccades.
38 Microsaccades are assumed to have a mean amplitude of 30 arcmin and a mean period of 30 ms.
39 Mean speeds of microsaccades are calculated from the main sequence relationship
40 and are parametrically related to the amplitudes of microsaccades. Each microsaccade speed is nonlinear and is approximated to that of a half-Gaussian function. The mean frequency of microsaccades is assumed to be 1.5 per second.
41
Certain assumed rules are enforced in the eye movement model: drift and microtremor take place simultaneously; neither drift nor microtremor can occur during microsaccades; no subsequent ocular drift can be in the same direction (±90°) as the previous microsaccade. This approximates the directional frequencies of eye movements observed in previous studies.
39 No initial directional preference was chosen for eye movements, although results have shown that horizontal and vertical eye movements display different characteristics.
42
From these parameters and rules, a retinal trace (x, y, t; the path of the retina in transverse axes over time) is derived for the position of the retina. The sampling period of each optical section is approximately 24 ms (frequency, 42 Hz), and within each section, the rastered horizontal line-sampling period is approximately 62 μs (frequency, 16 kHz). As the latter frequency is many orders of magnitude above maximum frequencies recorded for eye movements, the horizontal line scan time was considered a sufficient sampling frequency for our eye movement model (i.e., movement occurs between line scans and between optical sections). The transverse (x and y) spatial sampling for each HRT pixel is approximately 2 arcmin in an emmetropic eye.
Within-Examination Head Movements.
Between-Examination Movement.
The global uniform transformation and change in overall reflectivity of the confocal stack mimics variability resulting from placement of the head anew on the head and chin rests. A standard deviation (SD) of 10 pixels is chosen (by assumption) for the normal distribution from which lateral translations of the follow-up stack relative to baseline can be chosen as an estimate for the operator uncertainty in centering the ONH on scanning. Changes in the autofocus settings of the machine (correct axial placement of central CSLO section) are modeled by random small scaling and depth translations. Differences in initial head angle placement in each examination are mimicked by rotations of the scan triplet about each axis with relation to the baseline.
For all eye and head movements, implementation of deriving new intensity values at each new simulated pixel involves the defining of new coordinates for every pixel in the CLSO section stack under a given transformation. Filling in of unknown data between known intensity values at fixed pixel coordinates requires interpolation. Bilinear or trilinear interpolation was chosen for this process. It applies products of linear functions to the data of the neighboring 4 or 8 pixels in two- or three-dimensional space to obtain intensity values of new coordinates that do not coincide exactly with the coordinates of any pixel in the seed stack. If the transformation involves only transverse components, then bilinear interpolation is performed using only the values of the new coordinate's transverse neighbors. If axial components are involved in the transformation, then trilinear interpolation is performed using the axial neighboring pixel information in addition.
In the rare cases in which there is no valid measurement from the reflectance signal from the focused spot, these areas remain invalid under the new coordinate system. Bilinear/trilinear interpolation will render neighboring values that are close enough also invalid under the coordinate transform. The effect of movement that causes the resampled pixels to fall outside the boundaries of the original optical section, known as windowing, is an important technical point to consider. Null values are used and the topography formation and alignment algorithms will ignore information at these pixels. This effect occurs mostly to the edges of the optical section images. As previously stated, there are 64 extra horizontally peripheral pixels available for use in each optical section before topography formation. The simulation uses the information of these extra pixels to remove the effects of windowing from horizontal axis movement.
As the dwell time of the scanning beam at each sampled point is <0.5 × 10
−6 seconds, the effect of optical blur or smearing of the optical point spread function (PSF) due to eye movements is considered negligible for this model. Even at the maximum simulated eye speed, the maximum displacement during this dwell time would be <0.00025 of the transverse spatial sampling interval between pixels (one pixel width). One pixel width is approximately equal to a typical full width at half maximum of the PSF in the HRT scanning system, as referred to in the imaging system's lateral resolution specification.
44
Device Noise.
Noise Follow-up Time Dependence.
Implementation.
Local Within-Examination Variability.
Local Between-Examination Variability.
Global Within-Examination Variability.
Global Between-Examination Variability.
Clinical Measure of Variability: Neuroretinal Rim Area (RA) Variability of Mean Topographies.
Neuroretinal RA gives a numerical value to a clinically recognizable and meaningful feature. The loss of RA tissue provides a surrogate measure of the loss of retinal ganglion cell axons typical of glaucomatous damage.
49 It is the least variable of the stereometric HRT parameters and has been well characterized.
4,50 RA is thus a useful indicator for detecting glaucomatous progression.
51,52 In this study, the Moorfields fixed, standard reference plane (RP) was used to calculate RA.
53
The RA coefficient of variation (CV) is calculated as the ratio of the SD to the mean of the RA across mean topographies in a series. The RA CV is compared between real and corresponding simulated mean topography series, to examine the performance of the simulation in mimicking the variability of a clinically meaningful parameter.
Single-value metrics of variability (MPHSDw, MPHSDb, and RA CV) were compared by plotting the paired values (or the mean when appropriate) of simulated and real series and examining the mean difference. Mean differences (bias) and the 95% LoA between real and simulated series were calculated for these metrics. Analysis of proportional bias and agreement was also performed.
Correlation in the Spatial Distribution of Topographic Variability between Real and Simulated Series.
There is no gold-standard method of detecting structural changes in the glaucomatous eye. One solution may be afforded by simulating series of optic nerve head measurements with well-estimated measurement variability characteristics. This study proposes a blueprint for such an approach, using images from CSLO technology. This simulation is based on empirically derived parameters from previous studies and (HRT) values from published reports. The usefulness of a simulation is critically dependent on the degree to which it emulates real data. The simulation presented in this study reproduces well both the spatial characteristics and the magnitude of variability in short-time series of HRT data for both within-examination acquisition and between examinations. Image correlation of real and simulated spatial variability maps tend to be slightly less than the correlation of real spatial variability maps with each other, but the quantitative and qualitative differences are very small. The simulation also generates realistic values for RA, which is a clinically useful parameter for monitoring change. The contribution that simulation can make—independent of and before the collection of real data for scientific studies or clinical trials that are conducted to assess the validity of change detection measures—cannot be overstated. Given any number of baseline patient scans, it enables unlimited modification of the length of series, frequency of testing, and variability over time of the data. It also may reduce the potential waste in cost and time involved in data collection, given that study design may be optimized in advance by testing preliminary hypotheses on simulated test-bed data.
The simulations were benchmarked against real short-time series. Clinically significant change is highly unlikely to have taken place during the short study period, and age-related effects would also be almost nonexistent; the only changes in the series over time would be real fluctuations in the tissue morphology and variability in the measuring device and settings. This study has used parameters from previous studies on the movement of the eye (and, as a result, the movement of the ONH) during fixation from previous studies, along with the reasonable estimates of operator variability in initial ONH centration and the variability inherent in the device hardware in an attempt to make these assumptions as realistic as possible. Errors in estimating these parameters may result in inaccuracies in the simulation. Other potential sources of variability, such as the ocular pulse and intraocular pressure changes, have not been introduced, as they have not been sufficiently well characterized. The ocular pulse has been shown to contribute to changes in vessel caliber
56,57 and changes in the backward and forward movements of the retinal surface in vessel-free areas.
58 Long- and short-term IOP changes can also cause changes in the vessel caliber
59 along with changes in optic disc cupping
60 and size.
61 The simulation does not account for these effects, but it may be possible to do so using appropriate parameters. Despite these potential shortcomings, the simulations appear to realistically reproduce observed topographic variability.
Previous attempts to simulate series of ONH or retinal images are few and far between. One approach involved simulating retardance image series obtained from SLP baseline images.
29 SLP differs from CSLO, in that it measures relevant anatomy in a different manner and has different sources of variability. Another simulation methodology, specific to HRT data, has used small misalignments of identical finalized or postprocessed single topographies to generate longitudinal series.
8 Although this approach reproduces some of the global variability inherent in no-change patient HRT data, it reproduces very little of the spatial and feature dependent variability (O'Leary N, et al.
IOVS 2009;50:ARVO E-Abstract 2249). It is also unsatisfactory from a theoretical point of view, as it is difficult to estimate the relationship of acquisition noise factors and the variability of postprocessed topographies. In contrast, all the three-dimensional information acquired in the HRT scan is used the methodology presented in this study. Further, a key component of the simulation described in the present study is that the simulated scans are reimported into the HRT III software. Thus, any misalignments added both within examination and between examinations are subject to partial correction by the HRT III alignment algorithms, consistent with the procedure for real patient data.
Although within-examination single topographies are more closely related than between-session topographies, because of the effects of between-examination repositioning of the patient and realignment of the scanner to the eye/ONH, the simulation assumes no autocorrelation in time (knowing one value does not provide you with any information about another value) between examinations. Also, the within-examination noise parameters do not change with time nor do the differences between examinations. These assumptions are justified on the basis of study results showing no difference in variability between scans with intervals varying from 1 hour to 4 weeks.
48
There were a few cases in which variability was poorly reproduced in simulated series, as can be seen in
Figure 2a. Here, the largest anomalies were above the line of agreement showing that, in these cases, the average, global within-mean topography variability was higher in the simulation than in the real series. This finding may indicate that the optical section stack used to seed the simulation had more motion noise or lower image quality than those in the rest of the real series. Although the simulation model presented herein allows some alignment before noise addition, it is never possible to attain a motion- and noise-free image stack. Thus, the simulation depends on the optical section stack selected to seed each simulated series. Apart from random differences in a small sample of five mean topographies, there may be other potential causes for discrepancies in reproducing the variability of real data, for example, some differences in RA variability may be due to small differences in contour line placement between the real and simulated baseline topographies.
Windowing, as stated in the methodology, may have small systematic effects on differences in the calculations of variability between real and simulated data due to the potential null values on optical section and topography image edges. However, windowing caused by axial shifts should not affect the topography formation process, as any missing data will be in the low-intensity tails of each axial intensity profile. Topographic height information is mostly derived from the central maximum of each axial intensity profile. Vertical axis shifts have the potential to most affect the topography formation. Despite this, between-examination, vertical translations of scan triplets greater than 20 pixels (±2 SD: P = 0.05) between scans resulting in the loss of more than 20 pixels at the top or bottom of a mean topography will have a small effect on the alignment algorithms of the HRT and the calculation of variability values.
The simulation makes the simplifying assumption that eye movement and measurement noise parameters are the same for all patients. This assumption seemed sufficient in reproducing many of the characteristics of real data. Furthermore, it is important to note that in real patient data, patients with various degrees of lens opacity were scanned. This variation allowed the effects of a wide range of image quality to be modeled, with some eyes having high within- and between-examination variability. That this was also observed in the simulated data suggests that much of the observed range of image variability is a consequence of the range in the quality of ocular media and the structure of the ONH themselves.
Future work could usefully consider the formation process in the CSLO pixels (Gruppetta S, et al. IOVS 2009;50:ARVO:E-Abstract 319), allowing the modeling of other processes that occur before the summation of light at the detector for each pixel by modifying the PSF during and between acquisition sessions. These include changes in pupil diameter, in media opacities, and in uncorrected lower order aberrations (defocus and astigmatism). This simulation provides a model that can be used to test progression algorithm specificity appropriate to the baseline image quality, but cannot mimic systematically changing image quality over time. The long-term changes in the optical quality of the eye over time (increased scatter and absorption) resulting in a reduced and noisier light signal reflected to the device detector should be addressed, to mimic this feature in some topographic series over time.
The introduction in the simulations of topographic changes associated with focal and diffuse glaucomatous damage in the ONH should be investigated. Adding change to a series of replicate HRT scans, and then overlaying measurement noise from this simulation, enables assessment of the sensitivity of change detection algorithms (as the underlying location, temporal sequence, and magnitude of the superimposed change is known).
A stable series of measurements can be generated by simply adding random noise. Such series may also be used to test the specificity of statistical change detection algorithms developed for detecting progression. However, in reality, there are many other sources of variability in optic nerve head scans, such as eye movements, head movements, and device noise, the effects of which may be falsely detected as true morphologic change by these techniques. Such variability is manifestly more complex than simple additive noise, and therefore a more realistic, stable series simulation is needed to more accurately test the specificity of these algorithms. Stability in any clinical series of data is uncertain, whereas in a modeled series it is certain.
In conclusion, the methodology outlined in this work has definite use in producing benchmark data on which the false-positive rates of statistical methods for detecting HRT progression can be examined. This method could also improve the confidence in detection of change by these techniques in clinical studies. Simulating series of progressive change in HRT series awaits further investigation. Still, the results of this study are presented as a proof of principle toward a solution of the seemingly intractable problem of developing reliable methods of detecting disease progression in glaucoma.
Supported by the Special Trustees of Moorfields Eye Hospital and by an unrestricted grant from Heidelberg Engineering. DGH received a proportion of his funding from the Department of Health's NIHR Biomedical Research Centre at Moorfields Eye Hospital and the UCL Institute of Ophthalmology. The views expressed in this publication are those of the authors and not necessarily those of the UK Department of Health.
Disclosure:
N. O'Leary, None;
D.P. Crabb, None;
D.F. Garway-Heath, Heidelberg Engineering (F), Carl Zeiss Meditec (F, R), Optovue (F)
The authors thank Steve Gruppetta (City University London), Ed White (Moorfields Eye Hospital), and Michael Reutter (Heidelberg Engineering) for advice and invaluable technical support.