Purpose
To investigate the inter-rater reliability of retinal segmentation between two experts and an automated algorithm.
Methods
Optical coherence tomography (OCT) imaging and segmentation of the macula enable in-vivo quantification of retinal tissues. To be used clinically, the algorithms must be validated. In this study, 24 Spectralis OCT volumes were automatically segmented using custom software (rater “A1”). The resulting seven layer interfaces were independently presented to two experts (raters “R1” and “R2”), who then manually corrected all layers in all B-scans. Within the ETDRS grid, Bland-Altman analysis gauged overall differences; the intra-class correlation coefficients (ICC) ranked rater agreement. Layers analyzed were retinal nerve fiber (RNFL), ganglion cell and inner plexiform complex (GCIPL), outer plexiform (OPL), inner and outer nuclear (INL, ONL) and the photo-receptor complex from the inner segment to Bruch’s membrane (PR).
Results
Mean differences and ranges across raters were all within the inherent axial resolution (~5µ) of the device (Table 1). The limits of agreement (LOA) were highest for GCIP between the algorithm and the experts (16.15µ, 10.65µ) and within the experts (8.82µ); and lowest for the OPL (3.3 µ, 1.1 µ and 3.12µ, respectively), as might be expected based on average thickness. The ONL showed very good LOA (8.0, 1.79 and 3.12µ), especially given overall thickness. Excellent inter-rater reproducibility is reported across most layers, although overall agreement was low for the INL and the PR (Table 2). No individual layer differed significantly between at least one expert and the algorithm.
Conclusions
Automated segmentation of the macula can accurately identify seven retinal layer interfaces. Observed differences between any rater pairings could be either an algorithm limitation or the readers’ differing interpretations of the boundaries in the data. Given the laborious task of manually interpreting OCT data, however, an automated algorithm is preferable and typically more repeatable. Additionally, the algorithm used is device independent and, in this study, cannot be differentiated from manual results. The ability to reliably compare results across different OCT systems would be of great interest to the research community.
Keywords: 549 image processing •
550 imaging/image analysis: clinical •
552 imaging methods (CT, FA, ICG, MRI, OCT, RTA, SLO, ultrasound)