Abstract
Purpose :
Deep learning approach has potential for automatic delineation of retinal layers to obtain quantitative assessments of retinal features. In this study, we examined the impact of training dataset size on the performance of a deep learning model (DLM) for the measurements of photoreceptor ellipsoid zone (EZ) area and outer segment (OS) volume in RP.
Methods :
A DLM reported previously (Wang et al., TVST 2021) was trained on four datasets composed of midline B-scans from 140, 240, 340, and 480 patients with RP, respectively. Test datasets consisted of 96 high-speed 9-mm 31-line volume scans obtained from a separate group of 48 patients with RPGR-associated X-Linked RP who had EZ band within the scan window and having detectable EZ in at least 3 B-scans in each volume scan. EZ area and OS volume were measured from the EZ-RPE layer segmented using 3 methods: (1) trained DLMs (RP140, RP240, RP340, and RP 480, respectively); (2) manual correction of DLM segmentation (DLM-MC) by two experienced human graders; (3) conventional gold standard manual segmentation by the same two graders (MG). Dice similarity and Bland-Altman analysis were conducted to assess the agreement between different segmentation methods for EZ area and OS volume measurements.
Results :
As shown in the Table, with the increase of the training dataset size, the median dice score between DLM and MG increased initially then plateaued for EZ band segmentation. Similarly, for both EZ area and OS volume, the decrease of Bland-Altman coefficient of repeatability (CoR) was rapid initially with the increase of dataset size then slowed down markedly. In comparison, between DLM-MC and MG, median dice score was 0.8674 for EZ band segmentation, CoR was 1.83 mm2 for EZ area measurement and 0.0381 mm3 for OS volume measurement, comparable to that between MG1 and MG2.
Conclusions :
The results suggest that further increase of mid-line B-scans may not be beneficial. Off-center B-scans could be included in training datasets which may be helpful to improve the performance of DLM towards the level of manual grading. Manual correction to deep learning model segmentation can generate EZ area and OS volume measurements in close agreement to that by conventional manual grading in RP for obtaining quantitative measurements of biomarkers for assessing disease progression and treatment outcomes in RP.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.