Abstract
Purpose.:
To automatically segment retinal spectral domain optical coherence tomography (SD-OCT) images of eyes with age-related macular degeneration (AMD) and various levels of image quality to advance the study of retinal pigment epithelium (RPE)+drusen complex (RPEDC) volume changes indicative of AMD progression.
Methods.:
A general segmentation framework based on graph theory and dynamic programming was used to segment three retinal boundaries in SD-OCT images of eyes with drusen and geographic atrophy (GA). A validation study for eyes with nonneovascular AMD was conducted, forming subgroups based on scan quality and presence of GA. To test for accuracy, the layer thickness results from two certified graders were compared against automatic segmentation results for 220 B-scans across 20 patients. For reproducibility, automatic layer volumes were compared that were generated from 0° versus 90° scans in five volumes with drusen.
Results.:
The mean differences in the measured thicknesses of the total retina and RPEDC layers were 4.2 ± 2.8 and 3.2 ± 2.6 μm for automatic versus manual segmentation. When the 0° and 90° datasets were compared, the mean differences in the calculated total retina and RPEDC volumes were 0.28% ± 0.28% and 1.60% ± 1.57%, respectively. The average segmentation time per image was 1.7 seconds automatically versus 3.5 minutes manually.
Conclusions.:
The automatic algorithm accurately and reproducibly segmented three retinal boundaries in images containing drusen and GA. This automatic approach can reduce time and labor costs and yield objective measurements that potentially reveal quantitative RPE changes in longitudinal clinical AMD studies. (ClinicalTrials.gov number, NCT00734487.)
Age-related macular degeneration (AMD) is a leading cause of irreversible blindness in Americans older than 60 years.
1 There are many unanswered questions regarding the pathogenesis of AMD, which can be investigated in longitudinal studies using in vivo, high-resolution, cross-sectional imaging rather than color fundus photographs. The noninvasive, cross-sectional view of the retina from spectral domain optical coherence tomography (SD-OCT) imaging has been used to characterize the vitreoretinal interface, retina, RPE, and drusen complexes in the presence of AMD.
2 –4 For quantitative AMD studies, segmentation of the retina into layers and measurement of drusen volume are crucial. However, manual segmentation is time- and labor-intensive, limiting its use in large-scale studies. Researchers have therefore turned toward automatic segmentation techniques to process clinical data more efficiently.
For nonneovascular AMD, disease severity can be determined by quantifying drusen
5,6 and geographic atrophy (GA).
7,8 Traditionally, methods for drusen quantification rely on the evaluation of 2-D fundus photographs, where many algorithms have been developed to accelerate segmentation.
9 –13 With the advent of OCT, a third (axial) dimension of data has proven to be advantageous for drusen detection.
3 While many have demonstrated quantitative accuracy in drusen volume quantification with either manual or semiautomatic techniques,
4,14 most fully automatic methods only show proof-of-concept results
15 –19 and very few have been validated for accuracy.
20 Furthermore, drusen identification by commercial software integrated into several SD-OCT systems has shown distinct limitations.
21 Such shortcomings have raised interest in the utilization of polarization-sensitive OCT (PS-OCT) systems,
22 to directly segment the retinal pigment epithelium (RPE) structure.
23 Last, while several techniques for neovascular AMD segmentation have also been recently proposed,
24 –27 we target imaging intermediate AMD before advanced disease.
In addition to the complexities associated with developing fully automatic segmentation algorithms, uncertainties over the true boundary locations of evolving pathologic structures in retinal SD-OCT images pose yet another challenge. Reaching a consensus on these boundaries is often not a trivial task. For example, when the RPE is assessed in SD-OCT images of eyes with AMD pathology, the presence of drusen and GA significantly complicates the RPE structure, especially in instances of subretinal drusenoid deposits
28 –30 and irregular structures such as hyperreflective foci
4 or drusen remnants over GA.
31 This results in an often subjective or arbitrary delineation of the RPE layer.
In this article, we propose guidelines for identifying the retinal layers that are indicative of AMD progression, including the total retina and the RPE+drusen complex (RPEDC). To isolate these layers, we define boundaries at the inner aspect of the inner limiting membrane (ILM), inner aspect of the RPEDC, and outer aspect of Bruch's membrane (
Fig. 1). We then propose an algorithm that automatically segments these layer boundaries. In parallel with other graph theory-based algorithms,
19,32,33 this algorithm is in part based on our previously proposed graph theory and dynamic programming technique used to segment eight retinal layer boundaries, which has been verified to be accurate and reliable in normal adult eyes.
34 In this article, we also present the additional algorithmic steps that are required to apply this segmentation framework to eyes with nonneovascular AMD and subsequently validate it for accuracy and reproducibility.
Before manual segmentation and algorithm development, we constructed a set of qualitative guidelines based on previous literature, expertise from the Duke OCT Reading Center, and representative images, to trace layer boundaries on images with nonneovascular AMD pathology. Guidelines and example images were used as a reference for manual segmentation to maintain a consistent and unbiased interpretation between certified graders. Practice sessions for manual segmentation were also performed on training data sets based on the guidelines. These guidelines are listed as follows:
I. We isolate the RPE and drusen complex (denoted RPEDC) by delineating the inner aspect of the RPE plus drusen material and the outer aspect of Bruch's membrane.
Sarks et al.
35 have shown progression in AMD by correlating basal linear and basal laminar deposits of the RPE to greater amounts of membranous debris associated with clinically evident drusen and pigmentary changes on color funduscopic measurements. More recently, Zweifel et al.
36 have shown subretinal deposits in reticular drusen. Thus, in particular for macular SD-OCT datasets with nonneovascular AMD, we believe that a measure of the RPEDC volume containing all drusen material, whether above (
Fig. 2B) or below the RPE (
Fig. 2A), would be a more useful measure of disease. Such a metric, which includes the RPE and small deposits of drusen material rather than only large collections of debris, should therefore differentiate normal aging from pathologic AMD processes. This hypothesis will be tested in the longitudinal AREDS2 Ancillary Spectral Domain Optical Coherence Tomography (A2A SD-OCT) study with age-matched controls. Our hope is to show that RPEDC volume can be a useful metric for assessing earlier states of AMD by differentiating the earliest stages of disease from normal aging of the RPE.
II. We include all hyperreflective material contiguous with the RPE as part of the RPEDC, excluding the following.
-
Material over a nearly absent RPE with a width narrower than the azimuthal pixel resolution (
Fig. 3B).
-
Indistinguishable dim or shadowy features over a nearly absent RPE (
Fig. 3C).
We include all forms of drusen, such as sub-RPE drusen (
Fig. 2A) and subretinal drusenoid deposits (
Fig. 2B), in the RPEDC due to the implications outlined in guideline I. While hyperreflective foci have been suggested to indicate disease progression,
4 we chose not to include these foci as part of the RPEDC because they represent cells that have migrated away from (and are not contiguous with) the RPE. The inner border of the RPEDC was distinguished from the overlying hyperreflective IS-OS band when present, as demonstrated in
Figure 1C.
We do not include narrow particulate (
Fig. 3B) or dim material (
Fig. 3C) over regions where the RPE is nearly absent (
Fig. 3A) since they may represent residual drusen material or degenerated neurosensory cells.
31 To determine whether the RPE is nearly absent, we qualitatively assess the thickness of the RPE and use hyperreflectivity in the underlying choroid as a supporting indicator of geographic atrophy (GA).
8,37
For small, particulate material, we selected the minimum resolution to be equivalent to the azimuthal pixel resolution (distance between B-scans) to attain isotropic resolution, because in our experiments the azimuthal pixel resolution was lower than the lateral (distance between A-scans) and axial pixel resolutions (depth resolution). In this study, 67 μm was used as the minimum resolution.
Image Downsampling.
NFL-OPL and IS-RPE Separation.
There are two distinct hyperreflective regions in a filtered SD-OCT image of the retina: the region bounded by the NFL and outer plexiform layer (denoted NFL-OPL complex) and the region containing the inner segment–outer segment junction (IS-OS), RPE, and drusen (denoted IS-RPE complex). For retinal images with AMD, the pathology may result in a merging of the NFL-OPL and IS-RPE complexes. If these two regions are not separated before segmenting, then it is possible for the ILM boundary and the inner boundary of the RPEDC to be mistaken for each other due to similarities in their characteristics.
We therefore generate a binary mask of the image to isolate the NFL-OPL and IS-RPE hyperreflective complexes, by smoothing the image with an 11-pixel Gaussian filter with a standard deviation of 11 pixels, extracting the edges with a [−1;1] high-pass filter (using MATLAB notation; The MathWorks, Natick, MA), normalizing the image to range from 0 to 1, generating a binary mask using a threshold of 0.5 on the normalized image, opening any gaps in the clusters using a 3 × 3 pixel structuring element, removing connected clusters smaller than 200 pixels, and closing any remaining gaps using the same structuring element.
Once the mask is generated, we delineate the boundaries of the two white bands corresponding to the two NFL-OPL and OS-RPE complexes using graph theory and dynamic programming. We generate two vertical gradient adjacency matrices—a black-to-white and a white-to-black matrix—using the [−1;1] and [1;−1] edge filters and set all negative values to 0. After automatic endpoint initialization, we segment the four boundaries in the image. We achieve this by twice searching for a black-to-white edge to locate the upper boundaries of the two white bands and twice searching for a white-to-black edge to locate the two lower boundaries of the white bands. To ensure the same edge is not cut again, we exclude already delineated nodes from the graph when cutting subsequent edges. The result is a pilot estimate of the ILM, inner RPEDC, and Bruch's membrane boundaries.
Image Flattening.
Calculating Graph Weights.
Limiting the Search Region and Finding the Shortest Path.
Repeat for Subsequent Layer Boundaries.
Unflattening and Upsampling the Layer Boundaries.
For this study, we considered rectangular volumes with nonneovascular AMD under the A2A SD-OCT study, which was registered at clinicaltrials.gov and approved by the institutional review boards (IRBs) of the four A2A SD-OCT clinics (Devers Eye Institute, Duke Eye Center, Emory Eye Center, and the National Eye Institute). The study complied with the Declaration of Helsinki, and informed consent was obtained from all participants.
In the A2A SD-OCT study, volumetric scans were acquired using the SD-OCT imaging systems from Bioptigen, Inc. (Research Triangle Park, NC) located at the four clinic sites. For each patient across all sites, 0° and 90° rectangular volumes centered at the fovea with 1000 A-scans and 100 B-scans were captured for one eye. The scan sizes and the axial, lateral, and azimuthal resolutions varied slightly by site, and are specified in
Table 1. The eye length was not measured. For this study, we included volumes from all four clinical sites to validate algorithm performance for images acquired at slightly varying axial resolutions and by different clinical operators.
Table 1. Study Dataset Resolutions
Table 1. Study Dataset Resolutions
Study Site | Devers | Duke | Emory | NEI |
Axial FWHM resolution in retina, μm | 4.54 | 4.38 | 4.56 | 4.56 |
Axial pixel resolution in retina, μm/pixel | 3.21 | 3.23 | 3.06 | 3.24 |
Lateral pixel resolution, μm/pixel | 6.60 | 6.54 | 6.58 | 6.50 |
Azimuthal pixel resolution, μm/pixel | 68.2 | 67.0 | 69.8 | 65.0 |
Scan width, mm | 6.60 | 6.54 | 6.58 | 6.50 |
Scan Length, mm | 6.82 | 6.70 | 6.98 | 6.50 |
As part of the A2A SD-OCT study, each volume was graded for quality by graders certified by the Duke Advanced Research in Spectral Domain OCT Imaging (DARSI) group. In addition to an overall scoring of good, fair, or poor, they assessed these volumes for the following characteristics: (1) foveal centration (a fovea located approximately at the center of the volume); (2) presence of low resolution or saturation; (3) presence of artifacts produced by subject blinking; (4) presence of artifacts produced by eye motion or loss of fixation; (5) presence of complex conjugate artifacts; (6) scan artifacts arising from the imaging system; (7) tilt, clipping, or blank frames; and (8) ungradable. We used these existing scores in our study to classify the volumes as high quality, low quality, or excluded from the study based on the criteria in
Table 2. Volumes with motion or loss of fixation artifacts, for example, could not be categorized as high-quality, because they result in inaccurate retinal layer volume measurements. Likewise, we excluded volumes with blinking or complex conjugate artifacts in the region of interest, to avoid validating B-scans with missing retinal data.
Allowable Characteristics | Volume Quality |
High | Low | Excluded (from Validation) |
Pregraded volume quality | Good | Good, fair | Good, fair, poor |
Low resolution or saturation | | ✓ | ✓ |
Blinking artifacts within frames 20–60 | | | ✓ |
Motion or loss of fixation | | ✓ | ✓ |
Complex conjugate artifact within frames 20–60 | | | ✓ |
Imaging system scan artifact | ✓ | ✓ | ✓ |
Tilt, clipping, blank frames | ✓ | ✓ | ✓ |
Ungradable | | | ✓ |
Based on the criteria from
Table 3, we randomly selected a total of 25 volumes to validate the segmentation algorithm. The goal of the A2A SD-OCT study is to examine intermediate AMD; thus, we considered only volumes that were designated by the coordinating center to have level 3 (intermediate) AMD based on color fundus photography. Moreover, any volumes designated as level 3 by fundus photography that exhibited level 4 (advanced) pathology as seen on SD-OCT were excluded from the study. These included volumes with advanced AMD pathology such as choroidal neovascularization, serous pigment epithelial detachment, subretinal fluid, or GA at the foveal center. Vitelliform lesions were also excluded from the study, because they represent subretinal material that is not drusenoid. Last, of all 20 patients represented in the 25 selected volumes, 7 were imaged at the Devers Eye Institute, three at the Duke Eye Center, six at the Emory Eye Center, and four at the National Eye Institute (NEI). All the images used in the study and their corresponding manual and automatic segmentation data are available at
http://www.duke.edu/∼sf59/Chiu_IOVS_2011_dataset.htm.
Table 3. Validation Study Volume Selection Criteria
Table 3. Validation Study Volume Selection Criteria
| Group 1 | Group 2 | Group 3 | Group 4 |
Patients, n | 5 | 5 | 5 | 5 |
Volumes per patient, n | 2 | 1 | 1 | 1 |
Total volumes, n | 10 | 5 | 5 | 5 |
Pathology | Drusen | Drusen | Drusen + GA | Drusen + GA |
Volume quality | High | Low | High | Low |
Scan direction (0°/90°) | Both | Either | Either | Either |
A total of 220 B-scans from 20 volumes were selected for this analysis. Five of these 20 volumes comprised one randomly selected volume from each patient in group 1, and the remaining 15 volumes were those selected from groups 2 to 4 (defined in
Table 3). The 11 B-scans from each volume were chosen as follows, with
F denoting the B-scan number containing the foveal center:
F,
F ± 2,
F ± 5,
F ± 10,
F ± 15, and
F ± 20.
Two DARSI-certified graders performed manual segmentation of the retina by drawing three layer boundaries (inner aspect of the ILM, inner aspect of the RPEDC, and outer aspect of Bruch's membrane) using customized software with a graphic user interface (GUI). During manual segmentation, no outside consultation or communication between graders was allowed. We then performed automatic segmentation using the algorithm described earlier, which was implemented in MATLAB (The MathWorks).
After segmentation, B-scans were cropped by 20% on each side to achieve equal axial and azimuthal lengths in the segmented volume. The mean thickness difference between the automatic and manual segmentation of a predetermined (the more senior) grader was calculated for each B-scan. The absolute mean difference and standard deviation across all B-scans was then computed and compared between the automatic and manual segmentation. We also determined the maximum error and the percentage of A-scans with an error >5 pixels (note that the axial resolution varied by site, and therefore the 5 pixels was not converted to the 15.3–16.2-μm range). The same comparison was then conducted between the two manual graders, to estimate intergrader variability.
We coded the algorithm (MATLAB; The MathWorks), resulting in an average computation time of 1.7 seconds per image (512 × 1000 pixels) on a laptop computer with a 64-bit operating system, a CPU at 1.73 GHz (Core i7; Intel, Mountain View, CA), a 7200 rpm hard drive, and 16 GB of RAM. This time includes the overhead required for reading and writing operations. Manual segmentation took an average time of 3.5 minutes per image.
Despite the establishment of predefined segmentation guidelines and practice sessions for manual segmentation on training data sets, two certified graders did not achieve perfect agreement when delineating the layer boundaries (
Table 4, column 1). Implementing even more explicit guidelines for manual segmentation may improve agreement, but this will not eliminate the inherent intraobserver variability and differences between manual tracings. Also note that although we excluded RPEDC material over a nearly absent RPE with a minimum lateral width equal to the azimuthal pixel resolution (67 μm in this study), future investigators may employ a fixed width to improve uniformity across clinical studies.
Results show that our algorithm automatically segmented the total retina and RPEDC in eyes with intermediate AMD with accuracy comparable to that of a second human grader (
Table 4, column 1 versus 2). A low-quality volume did not significantly reduce the segmentation accuracy (
Table 4, volume groups 1 vs. 2 and 3 vs. 4), illustrating the algorithm's robustness for images of various levels of quality. Future study across a dataset of several hundred eyes with intermediate AMD may reveal new segmentation challenges that occur infrequently and thus may not have been identified in this series. We currently do not know the range of changes in RPEDC volume associated with disease progression or how these compare to color fundus photographs, and therefore we cannot be certain of the accuracy required for predictive volume measurements. RPEDC volume measurements from SD-OCT imaging will hopefully provide greater accuracy in assessing drusen load compared to the common technique of mentally summing the area of drusen visible on color fundus photographs.
40
Our measurement of the RPEDC builds from the known pathophysiology and morphology of AMD and should be useful in testing hypotheses of disease progression. The term drusen has been based on yellow spots visible on ophthalmoscopy, and has been recorded with color fundus photographs. They contain a wide range of materials, including lipids, lipoproteins, amyloid, collagen, proteins associated with inflammation, and degradation products.
41 –43 Although drusen can be composed of basal laminar deposits (internal to the RPE), basal linear deposits (external to the basal lamina of the RPE), and apical or subretinal deposits (reticular drusen), the difference between aging processes and the onset of AMD remains controversial.
29,44 –46 Each of these deposits has been implicated in the pathogenesis of AMD, and it would appear clinically relevant to identify the early onset of changes in the RPE associated with AMD. Although large drusen can be readily segmented from the RPE, small drusen deposits in the early stages of disease, depending on the pattern of reflectivity, would likely initially produce a change in RPE volume followed by a subsequent appearance of distinct drusen as the deposits enlarge. Thus, because of our interest in identifying RPE and drusen pathology associated with early AMD, we pursued RPEDC measurement to capture the full extent of early disease and chose to compare this to an aged non-AMD control population. This will be important when paired with measurements of the neurosensory retina to investigate the timing of RPE versus photoreceptor
4,47,48 morphologic changes in early AMD.
Because noncentral GA may be a component of intermediate AMD, we included eyes with GA in our algorithm testing. The algorithm was marginally less accurate for volumes containing both GA and drusen versus solely drusen, largely because of the different morphology of the RPEDC in these two types of pathology (
Fig. 6D). Furthermore, the algorithm exhibited a tendency to segment the RPE rather than the RPEDC in the presence of some subretinal drusenoid deposits (
Fig. 6B). Using an integrated algorithm to segment these types of pathology resulted in a tradeoff between extending functionality and compromising accuracy. To fully disclose these errors and any other limitations of our algorithm, we have made the complete validation dataset available online. A drawback of this or any automated segmentation system may be the need for human review of the automated segmentation results to assess for unexpected errors such as the ones shown in
Figure 6.
Even with these limitations, our algorithm segmented drusen of various shapes and sizes (
Fig. 5B), images of significantly low quality (
Fig. 5D), RPE and drusen in the presence of GA (
Fig. 5F), and retina with irregular curvatures (
Fig. 5H). Furthermore, the <5% difference in measured layer volume, when comparing 0° and 90° scans of the same eye (
Table 5), attests to the reproducibly of the automatic measurements. Differences in the measured layer volume may partially be attributable to the fact that the volumes were unregistered.
Not only did the algorithm segment these images accurately and reproducibly, but also efficiently. On average, a certified grader could draw three boundaries on a single B-scan in 3.5 minutes. This long segmentation time was largely attributable to the difficulty in segmenting the irregularly shaped inner border of the RPEDC and in distinguishing the RPE and drusen from extraneous material, such as hyperreflective foci and drusenoid remnants over GA. Future studies will include a more in-depth analysis on a larger pool of data and will identify common automated drusen segmentation errors similar to the identifications made in other studies.
20,21
The clinical implications of these results are encouraging for large-scale ophthalmic studies, since they suggest that this automatic segmentation algorithm can efficiently and reproducibly segment the total retina and RPEDC. Furthermore, for clinical studies with a wide range of image quality, our algorithm is capable of accurately segmenting images of lower quality. Last, automatic segmentation of the RPEDC contributes to the progress in drusen quantification, which is especially important in AMD studies. However, note that the algorithm segments all drusen types, including soft drusen, cuticular drusen, and subretinal drusenoid deposits. While soft drusen and subretinal drusenoid deposits have been shown to be significant indicators of AMD progression,
29,35,36,49 cuticular drusen are considered by some as not being associated with AMD.
50,51 Our future studies will include the development of automated drusen classification techniques to segment drusen types that are specific to a particular disease.
Validation of our proposed algorithm was limited to intermediate AMD and was not tested for disease processes such as neovascular AMD, vitreoretinal pathologies, or proliferative diabetic retinopathy. Algorithmic modification, extension of application, and assessment of the performance in eyes exhibiting pathologies outside of nonneovascular AMD is part of our ongoing work. Furthermore, while only volumes with high or low quality were considered in our validation study, this does not imply that the algorithm necessarily errs for volumes excluded from the study. These volumes were excluded due to missing retinal data. All such volumes will be included in our future studies identifying common segmentation and acquisition errors on a broader pool of data.
In summary, we developed a fully automatic algorithm to segment three retinal boundaries with a performance comparable to that of manual graders. The algorithm performed reliably for images containing drusen and GA and for images of various levels of quality and yielded reproducible measurements of layer volumes for the same eye. Our automatic approach can reduce time and labor costs and yield an objective evaluation for the study of AMD in future clinical studies.
Supported in part by the American Health Assistance Foundation. The A2A SD-OCT Study was funded in part by Genentech Grant IST-4400S, with clinical imaging equipment support from Bioptigen and Alcon Laboratories.
Disclosure:
S.J. Chiu, P;
J.A. Izatt, P;
R.V. O'Connell, None;
K.P. Winter, None;
C.A. Toth, Alcon (C, F), Genentech (C, F), Bioptigen (F), Physical Sciences Inc. (C), P;
S. Farsiu, P
The authors thank Stefanie G. Schuman (Director of Grading for the A2A SD-OCT study) for her contribution in developing the segmentation guidelines for AMD pathology, and Ramiro Maldonado, Michelle McCall, and Neeru Sarin for their contributions to the validation studies.