Investigative Ophthalmology & Visual Science Cover Image for Volume 66, Issue 6
June 2025
Volume 66, Issue 6
Open Access
Retina  |   June 2025
Identifying Retinal Features Using a Self-Configuring CNN for Clinical Intervention
Author Affiliations & Notes
  • Daniel S. Kermany
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    T. T. & W. F. Chao Center for BRAIN, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    Department of Biomedical Engineering, Texas A&M University, College Station, Texas, United States
    College of Medicine, Texas A&M Health Science Center, Bryan, Texas, United States
  • Wesley Poon
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    T. T. & W. F. Chao Center for BRAIN, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    College of Medicine, Texas A&M Health Science Center, Bryan, Texas, United States
  • Anaya Bawiskar
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    Department of Biomedical Engineering, Texas A&M University, College Station, Texas, United States
  • Natasha Nehra
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    School of Engineering Medicine, Texas A&M University, Houston, Texas, United States
  • Orhun Davarci
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    School of Engineering Medicine, Texas A&M University, Houston, Texas, United States
  • Glori Das
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    T. T. & W. F. Chao Center for BRAIN, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    Department of Biomedical Engineering, Texas A&M University, College Station, Texas, United States
    College of Medicine, Texas A&M Health Science Center, Bryan, Texas, United States
  • Matthew Vasquez
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    T. T. & W. F. Chao Center for BRAIN, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
  • Shlomit Schaal
    Department of Ophthalmology, Houston Methodist Academic Institute, Houston, Texas, United States
    Department of Ophthalmology, Weill Cornell College of Medicine, Houston, Texas, United States
  • Raksha Raghunathan
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    T. T. & W. F. Chao Center for BRAIN, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
  • Stephen T. C. Wong
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    T. T. & W. F. Chao Center for BRAIN, Department of Systems Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas, United States
    Department of Biomedical Engineering, Texas A&M University, College Station, Texas, United States
    College of Medicine, Texas A&M Health Science Center, Bryan, Texas, United States
  • Correspondence: Stephen T. C. Wong, Department of Systems Medicine and Bioengineering, Houston Methodist Research Institute, 6565 Fannin Street, Houston, TX 77030, USA; [email protected]
Investigative Ophthalmology & Visual Science June 2025, Vol.66, 55. doi:https://doi.org/10.1167/iovs.66.6.55
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Daniel S. Kermany, Wesley Poon, Anaya Bawiskar, Natasha Nehra, Orhun Davarci, Glori Das, Matthew Vasquez, Shlomit Schaal, Raksha Raghunathan, Stephen T. C. Wong; Identifying Retinal Features Using a Self-Configuring CNN for Clinical Intervention. Invest. Ophthalmol. Vis. Sci. 2025;66(6):55. https://doi.org/10.1167/iovs.66.6.55.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: Retinal diseases are leading causes of blindness worldwide, necessitating accurate diagnosis and timely treatment. Optical coherence tomography (OCT) has become a universal imaging modality of the retina in the past 2 decades, aiding in the diagnosis of various retinal conditions. However, the scarcity of comprehensive, annotated OCT datasets, that are labor-intensive to assemble, has hindered the advancement of artificial intelligence (AI)-based diagnostic tools.

Methods: To address the lack of annotated OCT segmentation datasets, we introduce OCTAVE, an extensive 3D OCT dataset with high-quality, pixel-level annotations for anatomic and pathological structures. Additionally, we provide similar annotations for four independent public 3D OCT datasets, enabling their use as external validation sets. To demonstrate the potential of this resource, we train a deep learning segmentation model using the self-configuring no-new-U-Net (nnU-Net) framework and evaluate its performance across all four external validation sets.

Results: The OCTAVE dataset collected consists of 198 OCT volumes (3762 B-scans) used for training and 221 OCT volumes (4109 B-scans) for external validation. The trained deep learning model demonstrates clinically significant performance across all retinal structures and pathological features.

Conclusions: We demonstrate robust segmentation performance and generalizability across independently collected datasets. OCTAVE bridges the gap in publicly available datasets, supporting the development of AI tools for precise disease detection, monitoring, and treatment guidance. This resource has the potential to improve clinical outcomes and advance AI-driven retinal disease management.

Retinal diseases like diabetic retinopathy (DR), age-related macular degeneration (AMD), epiretinal membrane (ERM), and vitreomacular adhesion (VMA) are major causes of vision impairment worldwide, impacting millions and straining healthcare systems.14 Early detection is essential to prevent irreversible vision loss.5,6 Optical coherence tomography (OCT), a noninvasive imaging technology, provides high-resolution cross-sectional images of the retina, and has transformed ophthalmology by allowing noninvasive, high-resolution imaging.79 Traditionally used in tabletop setups, recent advancements have led to microscope-integrated systems enabling real-time intraoperative OCT (iOCT) for procedures like membrane peeling and macular hole repair.10 OCT is now the gold standard for diagnosing, monitoring, and guiding retinal disease treatment globally. 
The rapid expansion of medical imaging has outpaced the availability of specialists, creating a bottleneck in OCT interpretation.11 Despite its widespread clinical use, the shortage of qualified experts hampers timely diagnoses.12 This challenge is exacerbated by the rising prevalence of serious eye diseases for which OCT is the primary evaluation tool.13 
Artificial intelligence (AI) offers a promising solution to address this gap.14 Deep learning models, particularly convolutional neural networks (CNNs), have shown exceptional performance in image analysis, including retinal disease detection and segmentation.1518 However, many existing models are limited by their reliance on specific datasets and lack generalizability across various OCT imaging systems and patient populations.19 Additionally, the lack of large, publicly available OCT datasets—especially full 3D volumes with detailed segmentation annotations—hinders the development of robust AI models. 
To address this critical gap, we introduce the optical coherence tomography annotated volume experiment (OCTAVE) dataset, a comprehensive 3D OCT dataset with pixel-level segmentation labels. Expert labelers ensured high-quality, consistent annotations, extending segmentation labels to four existing 3D OCT datasets for broader validation. This significantly enhances their utility and aids in developing generalizable AI models. 
In this study, we leverage this extensive, meticulously annotated dataset to develop a robust 3D self-configuring semantic segmentation deep learning, which automates network design and optimization.20 Our model, trained on extensive 3D OCT data and validated across multiple datasets, demonstrates strong generalizability across imaging systems and patient populations (Figs. 1a, 1b). This work helps bridge the gap in OCT interpretation and sets a new benchmark for retinal imaging research. 
Figure 1.
 
Pipeline of OCT volume processing, labeling, model training, and evaluation. (a) OCTAVE dataset of 198 OCT volumes used for model training and internal cross-validation. (b) External validation sets used for model performance testing and not included in the training process. These validation sets consist of 13 volumes from Kafieh et al. 2013, 10 volumes from Tian et al. 2015, 148 volumes from Rasti et al. 2018, and 50 volumes from Stankiewicz et al. 2021. (c) All volumes were downsampled to 19 b-scans to keep model inputs consistent and reduce labor required for manual labeling. (d) Empty 3D Slicer template files containing all necessary metadata required for labeling were generated using a script to reduce start-up time and user error during the manual labeling process. (e) Manual labeling was conducted under a three-tier grading procedure in which (1) trained and supervised students label straightforward features and normal anatomy, (2) experienced senior students confirm accuracy of these labels and label pathological features, and (3) senior students consult with ophthalmologists to reconcile any ambiguous features and verify accurate labeling. (f) Developed tool to identify any unlabeled pixels within volume that had undergone the three-tiered process. (g) Automated method to convert from the 3D Slicer NRRD format segmentation labels to the TIFF format required by the nnU-Net library. (h) The external validation datasets were reshaped to match the height and width of the OCTAVE training set. (i) Data augmentation methods randomly applied to each volume during training. (j) Model training was conducted over 5 distinct 80:20 training/validation splits of the OCTAVE dataset using the nnU-Net self-configuring deep learning architecture. (k) During inference and evaluation, an input volume is fed through the five distinct trained models. (l) The model outputs are ensembled to generate a final segmentation, which is used to calculate performance metrics. OCT, optical coherence tomography.
Figure 1.
 
Pipeline of OCT volume processing, labeling, model training, and evaluation. (a) OCTAVE dataset of 198 OCT volumes used for model training and internal cross-validation. (b) External validation sets used for model performance testing and not included in the training process. These validation sets consist of 13 volumes from Kafieh et al. 2013, 10 volumes from Tian et al. 2015, 148 volumes from Rasti et al. 2018, and 50 volumes from Stankiewicz et al. 2021. (c) All volumes were downsampled to 19 b-scans to keep model inputs consistent and reduce labor required for manual labeling. (d) Empty 3D Slicer template files containing all necessary metadata required for labeling were generated using a script to reduce start-up time and user error during the manual labeling process. (e) Manual labeling was conducted under a three-tier grading procedure in which (1) trained and supervised students label straightforward features and normal anatomy, (2) experienced senior students confirm accuracy of these labels and label pathological features, and (3) senior students consult with ophthalmologists to reconcile any ambiguous features and verify accurate labeling. (f) Developed tool to identify any unlabeled pixels within volume that had undergone the three-tiered process. (g) Automated method to convert from the 3D Slicer NRRD format segmentation labels to the TIFF format required by the nnU-Net library. (h) The external validation datasets were reshaped to match the height and width of the OCTAVE training set. (i) Data augmentation methods randomly applied to each volume during training. (j) Model training was conducted over 5 distinct 80:20 training/validation splits of the OCTAVE dataset using the nnU-Net self-configuring deep learning architecture. (k) During inference and evaluation, an input volume is fed through the five distinct trained models. (l) The model outputs are ensembled to generate a final segmentation, which is used to calculate performance metrics. OCT, optical coherence tomography.
Furthermore, accurate differentiation of anatomic and pathological structures in real-time OCT scans is crucial for integrating iOCT with robotic-assisted surgical platforms.10 By providing a large-scale, high-quality labeled dataset, we accelerate AI-driven advancements in ophthalmology. Our initiative underscores the need for collaborative data sharing and annotation, fostering innovation and improving patient outcomes. 
Methods
Data Collection
We collected OCT volumes using the Heidelberg Spectralis OCT system, sourced from the Kermany et al. dataset, encompassing retinal conditions such as DR, AMD, ERM, and VMA.18 For external validation, we labeled and utilized several public 3D OCT datasets: the Tian dataset, the Stankiewicz (CAVRI-A) dataset, the Rasti dataset, and the Kafieh dataset.2124 All aspects of this research were conducted in full compliance with the Declaration of Helsinki. We obtained all necessary permissions and approvals for the use of the data. 
OCT Labeling Process
To standardize input data, all external volumes were normalized to 19 B-scans by selecting a subset of evenly spaced, unmodified slices, balancing the need for consistent training dataset dimensions with the practical constraints of the manual annotation workload (Fig. 1c). To streamline this annotation process, we developed an automated tool that generated a preconfigured template for the 3D Slicer (version 4.10, www.slicer.org), an open-source software platform for biomedical image analysis. This template incorporated all necessary labels and settings for each volume, substantially streamlining the configuration process and reducing the time required to prepare each volume for labeling (Fig. 1d). A meticulous, multi-tiered manual labeling process was implemented to generate high-quality pixel-level segmentation labels for both anatomic and pathological features within the OCT volumes (Fig. 1e). An additional automated tool was developed to identify unlabeled pixels within volumes that had undergone the three-tiered annotation process (Fig. 1f), ensuring completeness and accuracy of the segmentation labels. We also streamlined the technical workflow by automating the conversion of segmentation labels from the 3D Slicer NRRD format to the TIFF format required by the no-new-U-Net (nnU-Net) library (Fig. 1g). Additionally, the external validation datasets were reshaped with bilinear interpolation to match the height and width of the OCTAVE training set (496 × 1024), ensuring consistency across datasets and facilitating effective validation (Fig. 1h). 
Detailed labeling protocols were established to guide the annotators across the three tiers, ensuring consistency in the identification and delineation of retinal features across the dataset. Using the 3D Slicer, the annotators manually segmented normal anatomical structures and any observable pathological features. The first tier of labeling was performed by a team of graduate and medical students who received specialized training segmenting key anatomy, following detailed protocols to ensure consistency. The second tier consisted of senior medical students experienced in retinal labeling who reviewed and refined the initial annotations, addressing any errors or inconsistencies. The third tier included board-certified ophthalmologists and retina specialists who provided expert opinions on complex and ambiguous cases and conducted random sampling of approximately 5% across all scans to verify accuracy. This tiered approach enhances precision, standardization, and reliability, making the dataset a robust resource for AI training in retinal imaging. 
Deep Learning Model
We used the nnU-Net framework (https://github.com/MIC-DKFZ/nnUNet) to develop our 3D semantic segmentation model, which self-optimizes network design based on dataset properties.20 Both 2D and 3D U-Net architectures were tested, but final analyses utilized 2D models due to superior performance. The encoder-decoder structure, featuring skip connections, enabled precise retinal structure localization. The framework automatically optimized hyperparameters, including network depth, kernel sizes, and feature maps. 
A custom loss function incorporated class weights derived via median frequency balancing to address class imbalance and improve segmentation of under-represented structures. The 2D nnU-Net was configured with a patch size of 512 × 1024, batch size of 4, and Z-score normalization. Training used a nine-stage U-Net with progressively increasing feature maps (32–512) and optimized convolutional layers (Supplementary Table S1). 
To further enhance model robustness, we introduced various augmentation techniques. We implemented custom rotation and scaling augmentations that replaced background labels from nnU-Net transformations with artifact (ART) labels. To enhance artifact diversity across datasets, another transformation was introduced to include black, white, and various gray shades, as seen in the Rasti, Kafieh, and Stankiewicz datasets. Additionally, a new augmentation class allowed dynamic adjustments to image intensity properties, such as window and level thresholds, complementing standard brightness and contrast transforms. To replicate the conditions of some external datasets, such as the Rasti dataset, we implemented a custom augmentation that simulated low-resolution and lower-quality images. This involved introducing controlled downsampling and compression artifacts into the training volumes. These strategies ensured the nnU-Net was rigorously optimized for both segmentation accuracy and generalizability by incorporating transformations tailored to enhance its performance on OCT data, addressing challenges posed by heterogeneous imaging devices and protocols (Fig. 1i). 
Training was conducted on a Linux server equipped with two NVIDIA A5000 GPUs (24 GB VRAM). The model was trained for a maximum of 1000 epochs, with early stopping criteria based on validation loss to prevent overfitting. A five-fold cross-validation strategy was used to assess internal model performance and generalizability. 
Deep Learning Model Training Using the OCTAVE Dataset
Training was conducted using a cross-validation approach with an 80:20 split, generating 5 distinct folds (Fig. 1j). For each fold, a separate model was trained, resulting in five distinct models. During inference, the outputs from all five models were ensembled to produce the final segmentation result (Fig. 1k). We used several standard metrics to evaluate the segmentation performance, including the Dice score and pixel accuracy. Pixel accuracy offers a straightforward and simple measure of performance which allows category-specific calculation of sensitivity and specificity but can be misleading when used as a global metric in cases of class imbalance. Dice scores, by contrast, are aggregate overlap metrics derived from the entire segmentation region rather than individual pixel classifications, accounting for both false positives and false negatives within a single score (Fig. 1l). The Dice score complements pixel accuracy by emphasizing the degree of overlap between predicted and ground-truth labels. It is often used to provide a more balanced assessment of segmentation quality, especially for small or infrequent classes that might otherwise appear well-classified using pixel-based metrics alone. The total Dice score is computed by considering performance from all categories together, so under-represented classes contribute less to that overall metric. One advantage of this approach is that it provides a single, comprehensive measurement that reflects the combined segmentation accuracy across the entire dataset. In contrast, the mean Dice score is derived by averaging the individual Dice scores from each class equally, which ensures that less frequent categories receive the same emphasis as the more common ones. However, this equal weighting can over-represent the impact of rare classes relative to their actual prevalence in the dataset. 
Data and Code Availability
We are providing the largest public annotated 3D OCT repository to date, consisting of 198 volumes in the OCTAVE dataset, with comprehensive pixel-level segmentation labels for both anatomic and pathological features. This repository additionally includes annotations for 221 of the 3D OCT volumes from the 4 external datasets described. The OCT volumes and corresponding segmentation annotations have been deposited at Zenodo at https://doi.org/10.5281/zenodo.14580071 and are publicly available as of the date of publication. All original codes can be found on GitHub at https://github.com/Translational-Biophotonics-Laboratory/octvision3d. Researchers accessing the dataset agree to cite and credit this manuscript and the authors of the original datasets. 
Results
In the OCTAVE dataset, 198 OCT volumes (3762 B-scans) were annotated and used for training and 221 OCT volumes (4109 B-scans) in total were collected and labeled from the 4 external validation sets. Each volume was meticulously annotated with pixel-level segmentation labels, capturing both anatomic and pathological features, such as the retina (RET), choroid and sclera (CHO), vitreous (VIT), retinal pigment epithelium (RPE), posterior hyaloid (HYA), retrohyaloid space (RHS), ERM, sub-ERM space (SES), artifact (ART), hyper-reflective material (HRM), fluid (FLU), subretinal material (SRM), and hypertransmission defect (HTD). The pixel-level distributions of these categories are provided in Supplementary Figure S1
Cross-Validation of the Trained Model on the OCTAVE Dataset
Our segmentation model demonstrated strong performance on cross-validation, accurately delineating both anatomic structures and pathological features (Fig. 2). Cross-validation pixel-level accuracy of the OCTAVE training dataset is summarized in a normalized confusion matrix (Fig. 3). Notably, CHO, VIT, RET, RPE, and ART achieved the highest sensitivities, at 0.99, 0.98, 0.97, 0.91, and 1.00, respectively. Sensitivities for HYA and RHS were 0.69 and 0.51, respectively, reflecting lower performance on thinner and less prominent structures. Similarly, among pathological classes, the ERM and SES achieved sensitivities of 0.60 and 0.63, respectively. For other pathological classes with more prominent appearances, the segmentation of HRM, SRM, HTD, and FLU yielded 0.70, 0.53, 0.64, and 0.71 sensitivities, respectively, whereas ART exhibited a sensitivity > 0.99. The cross-validation performance was robust, with a total Dice score of 0.977 and a mean score of 0.730. CHO achieved a Dice score 0.993, VIT 0.982, and RET 0.975, whereas RPE reached 0.835. For more challenging structures and features, HYA and RHS achieved Dice scores of 0.598 and 0.565, respectively, whereas ERM and SES yielded 0.604 and 0.611, respectively. FLU achieved 0.693, HRM = 0.677, SRM = 0.516, HTD = 0.449, and ART = 0.989 (Table 1). 
Figure 2.
 
The various OCT presentations represented within the OCTAVE dataset, including normal, PVD, VMA, ERM, ME, SRM, and GA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; GA, geographic atrophy; ME, macular edema; OCT, optical coherence tomography; PVD, posterior vitreous detachment; SRM, subretinal material; VMA, vitreomacular adhesion.
Figure 2.
 
The various OCT presentations represented within the OCTAVE dataset, including normal, PVD, VMA, ERM, ME, SRM, and GA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; GA, geographic atrophy; ME, macular edema; OCT, optical coherence tomography; PVD, posterior vitreous detachment; SRM, subretinal material; VMA, vitreomacular adhesion.
Figure 3.
 
Normalized confusion tables depict pixel-level accuracy in predicting segmentation labels within internal cross-validation. In cross-validation, each case contributed to validation once and training in the remaining four folds, allowing for comprehensive evaluation across all volumes. Each square within this table represents the proportion of the cases in each row (true labels) that have been predicted as the label corresponding with that column (predictions). For each row, the diagonal values represent the sensitivity for each label and the sum of all values per row, excluding the diagonal element, represents the false negative rate for that label. Labels that did not have any instances within a specific dataset were excluded from that dataset's confusion matrix. This normalized confusion matrix depicts the internal OCTAVE cross-validation set. ART, artifact; CHO, choroid and sclera; ERM, epiretinal membrane; FLU, intra-/sub-retinal fluid; HRM, hyper-reflective material; HTD, hypertransmission defect; HYA, hyaloid membrane; SES, sub-epiretinal membrane space; SRM, subretinal material; RET, retina; RHS, retrohyaloid space; RPE, retinal pigment epithelium; VIT, vitreous.
Figure 3.
 
Normalized confusion tables depict pixel-level accuracy in predicting segmentation labels within internal cross-validation. In cross-validation, each case contributed to validation once and training in the remaining four folds, allowing for comprehensive evaluation across all volumes. Each square within this table represents the proportion of the cases in each row (true labels) that have been predicted as the label corresponding with that column (predictions). For each row, the diagonal values represent the sensitivity for each label and the sum of all values per row, excluding the diagonal element, represents the false negative rate for that label. Labels that did not have any instances within a specific dataset were excluded from that dataset's confusion matrix. This normalized confusion matrix depicts the internal OCTAVE cross-validation set. ART, artifact; CHO, choroid and sclera; ERM, epiretinal membrane; FLU, intra-/sub-retinal fluid; HRM, hyper-reflective material; HTD, hypertransmission defect; HYA, hyaloid membrane; SES, sub-epiretinal membrane space; SRM, subretinal material; RET, retina; RHS, retrohyaloid space; RPE, retinal pigment epithelium; VIT, vitreous.
External Evaluation
To evaluate the generalizability of our segmentation model, we utilized several well-known public OCT datasets for external validation, including datasets from Tian et al., Stankiewicz et al., Rasti et al., and Kafieh et al. Each of these datasets offers unique characteristics in terms of imaging protocols, patient demographics, and retinal pathologies. The Rasti dataset is a large and diverse dataset of 148 volumes, with 18 to 60 B-scans per volume, containing 50 diabetic macular edema (DME) volumes, 48 AMD volumes, and 50 volumes without signs of either disease. Additionally, we have noted that 73 and 62 of these volumes additionally contain features consistent with ERM and VMA, respectively (Table 2). The Rasti dataset had the largest variation in image quality and resolution (Fig. 4). The Kafieh dataset provides 13 OCT volumes, each with 128 B-scans of normal retinas with mild features of ERM and VMA found in 3 volumes (Fig. 5). The Tian dataset contains 10 volumes containing 10 B-scans each of normal retinas, without significant pathology, aside from partial vitreous detachment within the perifoveal region without changes to retinal contour or morphology (Fig. 6). The Stankiewicz dataset, also known as the CAVRI-A dataset, contains 50 OCT volumes, with 141 B-scans per volume, evenly split between cases of vitreomacular adhesion, where there are not retinal morphology changes, and vitreomacular traction, which contains cases ranging from minimal contour and morphology changes to severe traction causing retinal edema and partial macular hole formation (Fig. 7). 
Figure 4.
 
OCT represented within the Rasti dataset, including normal, PVD, VMA, ERM, ME, PED, and GA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; GA, geographic atrophy; OCT, optical coherence tomography; ME, macular edema; PED, pigment epithelial detachment; PVD, posterior vitreous detachment; VMA, vitreomacular adhesion.
Figure 4.
 
OCT represented within the Rasti dataset, including normal, PVD, VMA, ERM, ME, PED, and GA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; GA, geographic atrophy; OCT, optical coherence tomography; ME, macular edema; PED, pigment epithelial detachment; PVD, posterior vitreous detachment; VMA, vitreomacular adhesion.
Figure 5.
 
OCT represented within the Kafieh dataset, including normal, VMA, and ERM. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; OCT, optical coherence tomography; VMA, vitreomacular adhesion.
Figure 5.
 
OCT represented within the Kafieh dataset, including normal, VMA, and ERM. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; OCT, optical coherence tomography; VMA, vitreomacular adhesion.
Figure 6.
 
OCT represented within the Tian dataset, including normal and VMA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. OCT, optical coherence tomography; VMA, vitreomacular adhesion.
Figure 6.
 
OCT represented within the Tian dataset, including normal and VMA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. OCT, optical coherence tomography; VMA, vitreomacular adhesion.
Figure 7.
 
OCT represented within the Stankiewicz dataset, including normal, VMA, and VMT. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. OCT, optical coherence tomography; VMA, vitreomacular adhesion; VMT, vitreomacular traction.
Figure 7.
 
OCT represented within the Stankiewicz dataset, including normal, VMA, and VMT. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. OCT, optical coherence tomography; VMA, vitreomacular adhesion; VMT, vitreomacular traction.
To evaluate the robustness and generalizability of features the model learned from its training on the OCTAVE datasets, we assessed model performance across four external validation datasets by calculating pixel-level accuracies and Dice scores. The model consistently achieved high segmentation performance for major anatomic structures, particularly CHO, VIT, RET, and RPE, across all datasets, with pixel-level sensitivities generally exceeding 0.95. Dice scores remained strong, averaging above 0.96 for CHO, VIT, and RET and approximately 0.72 to 0.83 for RPE, confirming reliable segmentation of these key structures. Pathological and finer structures, such as HYA, RHS, ERM, SES, FLU, HRM, SRM, HTD, and ART, showed greater variability, with some scores indicating moderate to lower overlap compared to the anatomic categories, particularly in datasets with greater imaging variability. Full Dice score results for each dataset are detailed in Table 1 and pixel-level accuracies can be found in the normalized confusion matrices in Figure 8
Table 1.
 
Summary of Model Dice Scores on the Various Test OCT Datasets Across the Label Categories
Table 1.
 
Summary of Model Dice Scores on the Various Test OCT Datasets Across the Label Categories
Table 2.
 
Description of Public Datasets Utilized for External Validation
Table 2.
 
Description of Public Datasets Utilized for External Validation
Figure 8.
 
Normalized confusion tables depict pixel-level accuracy of the trained model ensemble in predicting segmentation labels within the external validation datasets. Each square within this table represents the proportion of the cases in each row (true labels) that have been predicted as the label corresponding with that column (predictions). For each row, the diagonal values represent the sensitivity for each label and the sum of all values per row, excluding the diagonal element, represents the false negative rate for that label. Labels that did not have any instances within a specific dataset were excluded from that dataset's confusion matrix. The datasets featured include (a) the Kafieh dataset, (b) the Tian dataset, (c) the Rasti dataset, and (d) the Stankiewicz dataset. ART, artifact; CHO, choroid and sclera; ERM, epiretinal membrane; FLU, intra-/sub-retinal fluid; HTD, hypertransmission defect; HRM, hyper-reflective material; HYA, hyaloid membrane; SES, sub-epiretinal membrane space; SRM, subretinal material; RET, retina; RHS, retrohyaloid space; RPE, retinal pigment epithelium VIT, vitreous.
Figure 8.
 
Normalized confusion tables depict pixel-level accuracy of the trained model ensemble in predicting segmentation labels within the external validation datasets. Each square within this table represents the proportion of the cases in each row (true labels) that have been predicted as the label corresponding with that column (predictions). For each row, the diagonal values represent the sensitivity for each label and the sum of all values per row, excluding the diagonal element, represents the false negative rate for that label. Labels that did not have any instances within a specific dataset were excluded from that dataset's confusion matrix. The datasets featured include (a) the Kafieh dataset, (b) the Tian dataset, (c) the Rasti dataset, and (d) the Stankiewicz dataset. ART, artifact; CHO, choroid and sclera; ERM, epiretinal membrane; FLU, intra-/sub-retinal fluid; HTD, hypertransmission defect; HRM, hyper-reflective material; HYA, hyaloid membrane; SES, sub-epiretinal membrane space; SRM, subretinal material; RET, retina; RHS, retrohyaloid space; RPE, retinal pigment epithelium VIT, vitreous.
The model demonstrated strong performance on the Kafieh dataset. HYA and RHS registered sensitivities of 0.83 and 0.80, whereas ERM attained 0.66 (see Fig. 8a). The pathological features present in the Kafieh dataset included HYA, RHS, and ERM, which showed Dice scores of 0.490, 0.674, and 0.399, respectively. The total Dice score was 0.981, with a mean Dice score was 0.748. 
Pixel-level sensitivities in the Tian dataset were relatively similar to the Kafieh dataset with 0.89 for HYA, whereas RHS, ERM, and ART recorded 0.78, 0.57, and 0.66, respectively (see Fig. 8b). HYA, RHS, ERM, and ART were 0.596, 0.360, 0.284, and 0.766, respectively. The total Dice score for this dataset was 0.969, and the mean Dice score was 0.716. 
The Rasti dataset contained much more variability in the types of pathological features present as well as in the presentation of those features. Despite this variability, pixel sensitivity of HYA stands at 0.75, whereas RHS, ERM, SES, and ART are 0.65, 0.60, 0.26, and 0.76, respectively, reflecting similar performance compared to the Kafieh and Tian dataset. The categories of FLU, HRM, SRM, and HTD, which were not present in either the Kafieh or Tian datasets, attained 0.68-, 0.50-, 0.39-, and 0.47-pixel sensitivities, respectively (see Fig. 8c). When measuring Dice scores, the pathological categories demonstrated similar scores compared to their pixel sensitivities, except for ERM and ART which yielded Dice scores of 0.289 and 0.287, respectively, suggesting a higher false positive rate compared with the other categories. The total Dice score is 0.911, with a mean of 0.599. 
The Stankiewicz dataset presented greater challenges due to image variability compared with the other datasets and predominately contained cases of vitreoretinal interface disorders. Unlike in the previous datasets, the VIT category attained a significantly lower pixel sensitivity of 0.64, with a similarly lower Dice score of 0.756, due to a high rate of incorrect predictions of ART, despite no artifacts present in this dataset. The pixel sensitivities among the other pathological categories were similar, if not better than those of the other datasets (see Fig. 8d). However, several categories demonstrated significantly lower Dice scores, including ERM, SES, SRM, and HTD, indicating high false positive rates despite their high pixel sensitivities. The total Dice score was 0.816, with a mean Dice score of 0.546. 
Discussion
Significance of OCT Annotations for Medical AI
To our knowledge, the OCTAVE dataset is the largest open-source 3D OCT dataset with pixel-level annotations which, along with the pixel-level annotations provided for the four well-known public OCT datasets, addresses a critical shortage of high-quality annotated data in ophthalmology. By providing detailed segmentation labels for multiple public datasets, OCTAVE enhances model development, benchmarking, and validation, fostering global collaboration and innovation in retinal imaging research. 
Existing public OCT datasets are limited in size, dimensionality (often consisting of single 2D slices per eye rather than 3D volumes), and lack pixel-level segmentation labels for retinal features.18 In contrast, OCTAVE only includes 3D volumes with detailed labels for both anatomic and pathological features, improving the interpretability of the AI outputs for greater utility in research and clinical applications. 
Segmentation Performance in OCTAVE Cross-Validation
High segmentation accuracy in OCTAVE cross-validation has clinical relevance, as pixel-level sensitivities confirm the model's ability to distinguish diverse tissues and pathologies. Major anatomic layers (CHO, VIT, RET, and RPE) were consistently segmented, supporting clinical tasks like retinal thickness measurement for macular degeneration and diabetic retinopathy diagnosis. However, lower sensitivities in thinner regions (HYA and RHS) reflect the difficulty of delineating overlapping structures. Pathological classes (ERM, SES, HRM, SRM, HTD, and FLU) were detected at meaningful rates, indicating strong model performance in identifying abnormalities. The model's ability to differentiate true OCT signals from ART ensures clean outputs, crucial for clinical workflows. 
Generalizability Across Diverse Public Datasets
Performance on four external public datasets demonstrated the model’s robustness and adaptability to diverse imaging conditions and patient populations. Across the Kafieh, Tian, Rasti, and Stankiewicz datasets, high accuracy for major anatomic layers mirrored internal cross-validation performance, confirming reliability. However, discrepancies between sensitivity and Dice scores in pathological classes suggest a tendency for over-segmentation. ART label performance varied due to differences in ART representation—OCTAVE used white padding, whereas external datasets contained black padding, complicating the distinction from darker retinal structures. Although data augmentation partially mitigated this, residual mismatches persisted, especially in darker or noisier images. For example, in Figure 4, ART regions along the edge of some scans were misclassified as background (black regions), like due to the extreme darkness of this region mimicking the black padding artifacts created by the rotation and scaling data augmentations. The Stankiewicz dataset notably misclassified vitreous regions due to lower brightness and unique noise patterns, highlighting the impact of imaging variability on segmentation. Several characteristic and notable segmentation errors were depicted in Supplementary Figure S2. The relatively high total Dice scores reflect a robust overall segmentation performance, whereas the mean Dice scores underscore how certain categories—particularly those that are under-represented or appear visually subtle—can still pose segmentation challenges. 
Clinical Applications of Automated OCT Segmentation
The model's accuracy and generalizability have major clinical implications, particularly in expanding access to retinal disease screening. Automated segmentation reduces reliance on specialist interpretation, improving efficiency in clinical workflows and reducing the time and specialized expertise required for interpreting OCT scans. The model’s strong performance across multiple datasets, sourced from diverse regions and imaging devices, demonstrates its potential applicability in varied clinical settings, expanding access to retinal disease screening and monitoring in regions with limited availability of eye specialists. 
Beyond diagnostic applications, the model's high-resolution, accurate segmentation of retinal features has potential applications in iOCT procedures, automated surgical devices, and robotic-assisted interventions.10 Precise identification of anatomic landmarks and pathological areas is critical for guiding instruments during delicate retinal surgeries. The integration of such segmentation models into robotic navigation systems could enhance the precision and safety of these interventions. 
Limitations of the Study
Despite the promising results, several limitations should be acknowledged. Manual annotations, although rigorously reviewed, remain subject to human error. Whereas our multi-tiered labeling system ensured accuracy, variability in complex features remains a challenge. Expanding the dataset to include more diverse pathologies will improve generalizability. Imaging differences across devices also affect performance, necessitating further optimization. 
Our work demonstrates the potential of combining deep learning with large, well-annotated datasets to advance AI-driven retinal disease analysis. OCTAVE sets a new standard in OCT segmentation, promoting accuracy, accessibility, and clinical integration. Continued research and dataset expansion will drive further improvements, enhancing AI applications in ophthalmology and patient care. 
Acknowledgments
The authors thank the GPU supercomputer facility at the Laboratory for Artificial Intelligence in Medical Innovation, the Systems Medicine and Bioengineering Department, Houston Methodist Hospital, for their support. 
Supported by the National Eye Institute F31EY037177 (D.S.K.); National Cancer Institute R01CA288613 (S.T.C.W.); National Cancer Institute R01NS140292 (S.T.C.W.); T.T. and W.F. Chao Foundation (S.T.C.W.); John S. Dunn Research Foundation (S.T.C.W.); and Johnsson Estate (S.T.C.W.). 
Disclosure: D.S. Kermany, None; W. Poon, None; A. Bawiskar, None; N. Nehra, None; O. Davarci, None; G. Das, None; M. Vasquez, None; S. Schaal, None; R. Raghunathan, None; S.T.C. Wong, None 
References
Duh EJ, Sun JK, Stitt AW. Diabetic retinopathy: current understanding, mechanisms, and treatment strategies. JCI Insight. 2017; 2(14): e93751. [CrossRef] [PubMed]
Fung AT, Galvin J, Tran T. Epiretinal membrane: a review. Clin Exp Ophthalmol. 2021; 49(3): 289–308. [CrossRef] [PubMed]
Phillips JD, Hwang ES, Morgan DJ, Creveling CJ, Coats B. Structure and mechanics of the vitreoretinal interface. J Mech Behav Biomed Mater. 2022; 134: 105399. [CrossRef] [PubMed]
Wong WL, Su X, Li X, et al. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Glob Health. 2014; 2(2): e106–e116. [CrossRef] [PubMed]
Liu L, Swanson M. Improving patient outcomes: role of the primary care optometrist in the early diagnosis and management of age-related macular degeneration. Clin Optom. 2013; 5: 1–12.
Ho AC, Albini TA, Brown DM, Boyer DS, Regillo CD, Heier JS. The potential importance of detection of neovascular age-related macular degeneration when visual acuity is relatively good. JAMA Ophthalmol. 2017; 135(3): 268–273. [CrossRef] [PubMed]
Barak Y, Ihnen MA, Schaal S. Spectral domain optical coherence tomography in the diagnosis and management of vitreoretinal interface pathologies. J Ophthalmol. 2012; 2012(1): 876472. [PubMed]
Drexler W, Fujimoto JG. State-of-the-art retinal optical coherence tomography. Prog Retin Eye Res. 2008; 27(1): 45–88. [CrossRef] [PubMed]
Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science. 1991; 254(5035): 1178–1181. [CrossRef] [PubMed]
Ciarmatori N, Pellegrini M, Nasini F, Talli PM, Sarti L, Mura M. The state of intraoperative OCT in vitreoretinal surgery: recent advances and future challenges. Tomography. 2023; 9(5): 1649–1659. [CrossRef] [PubMed]
Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer. 2018; 18(8): 500–510. [CrossRef] [PubMed]
Feng PW, Ahluwalia A, Feng H, Adelman RA. National Trends in the United States Eye Care Workforce from 1995 to 2017. Am J Ophthalmol. 2020; 218: 128–135. [CrossRef] [PubMed]
Flaxman SR, Bourne RRA, Resnikoff S, et al. Global causes of blindness and distance vision impairment 1990–2020: a systematic review and meta-analysis. Lancet Glob Health. 2017; 5(12): e1221–e1234. [CrossRef] [PubMed]
Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019; 25(1): 24–29. [CrossRef] [PubMed]
Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017; 42: 60–88. [CrossRef] [PubMed]
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521(7553): 436–444. [CrossRef] [PubMed]
De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018; 24(9): 1342–1350. [CrossRef] [PubMed]
Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018; 172(5): 1122–1131.e9. [CrossRef] [PubMed]
Yanagihara RT, Lee CS, Ting DSW, Lee AY. Methodological challenges of deep learning in optical coherence tomography for retinal diseases: a review. Transl Vis Sci Technol. 2020; 9(2): 11. [CrossRef] [PubMed]
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021; 18(2): 203–211. [CrossRef] [PubMed]
Kafieh R, Rabbani H, Abramoff MD, Sonka M. Intra-retinal layer segmentation of 3D optical coherence tomography using coarse grained diffusion map. Med Image Anal. 2013; 17(8): 907–928. [CrossRef] [PubMed]
Rasti R, Rabbani H, Mehridehnavi A, Hajizadeh F. Macular OCT classification using a multi-scale convolutional neural network ensemble. IEEE Trans Med Imaging. 2018; 37(4): 1024–1034. [CrossRef] [PubMed]
Stankiewicz A, Marciniak T, Dabrowski A, Stopa M, Marciniak E, Obara B. Segmentation of preretinal space in optical coherence tomography images using deep neural networks. Sensors. 2021; 21(22): 7521. [CrossRef] [PubMed]
Tian J, Varga B, Somfai GM, Lee W-H, Smiddy WE, Cabrera DeBuc D. Real-time automatic segmentation of optical coherence tomography volume data of the macular region. PLoS One. 2015; 10(8): e0133908. [CrossRef] [PubMed]
Figure 1.
 
Pipeline of OCT volume processing, labeling, model training, and evaluation. (a) OCTAVE dataset of 198 OCT volumes used for model training and internal cross-validation. (b) External validation sets used for model performance testing and not included in the training process. These validation sets consist of 13 volumes from Kafieh et al. 2013, 10 volumes from Tian et al. 2015, 148 volumes from Rasti et al. 2018, and 50 volumes from Stankiewicz et al. 2021. (c) All volumes were downsampled to 19 b-scans to keep model inputs consistent and reduce labor required for manual labeling. (d) Empty 3D Slicer template files containing all necessary metadata required for labeling were generated using a script to reduce start-up time and user error during the manual labeling process. (e) Manual labeling was conducted under a three-tier grading procedure in which (1) trained and supervised students label straightforward features and normal anatomy, (2) experienced senior students confirm accuracy of these labels and label pathological features, and (3) senior students consult with ophthalmologists to reconcile any ambiguous features and verify accurate labeling. (f) Developed tool to identify any unlabeled pixels within volume that had undergone the three-tiered process. (g) Automated method to convert from the 3D Slicer NRRD format segmentation labels to the TIFF format required by the nnU-Net library. (h) The external validation datasets were reshaped to match the height and width of the OCTAVE training set. (i) Data augmentation methods randomly applied to each volume during training. (j) Model training was conducted over 5 distinct 80:20 training/validation splits of the OCTAVE dataset using the nnU-Net self-configuring deep learning architecture. (k) During inference and evaluation, an input volume is fed through the five distinct trained models. (l) The model outputs are ensembled to generate a final segmentation, which is used to calculate performance metrics. OCT, optical coherence tomography.
Figure 1.
 
Pipeline of OCT volume processing, labeling, model training, and evaluation. (a) OCTAVE dataset of 198 OCT volumes used for model training and internal cross-validation. (b) External validation sets used for model performance testing and not included in the training process. These validation sets consist of 13 volumes from Kafieh et al. 2013, 10 volumes from Tian et al. 2015, 148 volumes from Rasti et al. 2018, and 50 volumes from Stankiewicz et al. 2021. (c) All volumes were downsampled to 19 b-scans to keep model inputs consistent and reduce labor required for manual labeling. (d) Empty 3D Slicer template files containing all necessary metadata required for labeling were generated using a script to reduce start-up time and user error during the manual labeling process. (e) Manual labeling was conducted under a three-tier grading procedure in which (1) trained and supervised students label straightforward features and normal anatomy, (2) experienced senior students confirm accuracy of these labels and label pathological features, and (3) senior students consult with ophthalmologists to reconcile any ambiguous features and verify accurate labeling. (f) Developed tool to identify any unlabeled pixels within volume that had undergone the three-tiered process. (g) Automated method to convert from the 3D Slicer NRRD format segmentation labels to the TIFF format required by the nnU-Net library. (h) The external validation datasets were reshaped to match the height and width of the OCTAVE training set. (i) Data augmentation methods randomly applied to each volume during training. (j) Model training was conducted over 5 distinct 80:20 training/validation splits of the OCTAVE dataset using the nnU-Net self-configuring deep learning architecture. (k) During inference and evaluation, an input volume is fed through the five distinct trained models. (l) The model outputs are ensembled to generate a final segmentation, which is used to calculate performance metrics. OCT, optical coherence tomography.
Figure 2.
 
The various OCT presentations represented within the OCTAVE dataset, including normal, PVD, VMA, ERM, ME, SRM, and GA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; GA, geographic atrophy; ME, macular edema; OCT, optical coherence tomography; PVD, posterior vitreous detachment; SRM, subretinal material; VMA, vitreomacular adhesion.
Figure 2.
 
The various OCT presentations represented within the OCTAVE dataset, including normal, PVD, VMA, ERM, ME, SRM, and GA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; GA, geographic atrophy; ME, macular edema; OCT, optical coherence tomography; PVD, posterior vitreous detachment; SRM, subretinal material; VMA, vitreomacular adhesion.
Figure 3.
 
Normalized confusion tables depict pixel-level accuracy in predicting segmentation labels within internal cross-validation. In cross-validation, each case contributed to validation once and training in the remaining four folds, allowing for comprehensive evaluation across all volumes. Each square within this table represents the proportion of the cases in each row (true labels) that have been predicted as the label corresponding with that column (predictions). For each row, the diagonal values represent the sensitivity for each label and the sum of all values per row, excluding the diagonal element, represents the false negative rate for that label. Labels that did not have any instances within a specific dataset were excluded from that dataset's confusion matrix. This normalized confusion matrix depicts the internal OCTAVE cross-validation set. ART, artifact; CHO, choroid and sclera; ERM, epiretinal membrane; FLU, intra-/sub-retinal fluid; HRM, hyper-reflective material; HTD, hypertransmission defect; HYA, hyaloid membrane; SES, sub-epiretinal membrane space; SRM, subretinal material; RET, retina; RHS, retrohyaloid space; RPE, retinal pigment epithelium; VIT, vitreous.
Figure 3.
 
Normalized confusion tables depict pixel-level accuracy in predicting segmentation labels within internal cross-validation. In cross-validation, each case contributed to validation once and training in the remaining four folds, allowing for comprehensive evaluation across all volumes. Each square within this table represents the proportion of the cases in each row (true labels) that have been predicted as the label corresponding with that column (predictions). For each row, the diagonal values represent the sensitivity for each label and the sum of all values per row, excluding the diagonal element, represents the false negative rate for that label. Labels that did not have any instances within a specific dataset were excluded from that dataset's confusion matrix. This normalized confusion matrix depicts the internal OCTAVE cross-validation set. ART, artifact; CHO, choroid and sclera; ERM, epiretinal membrane; FLU, intra-/sub-retinal fluid; HRM, hyper-reflective material; HTD, hypertransmission defect; HYA, hyaloid membrane; SES, sub-epiretinal membrane space; SRM, subretinal material; RET, retina; RHS, retrohyaloid space; RPE, retinal pigment epithelium; VIT, vitreous.
Figure 4.
 
OCT represented within the Rasti dataset, including normal, PVD, VMA, ERM, ME, PED, and GA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; GA, geographic atrophy; OCT, optical coherence tomography; ME, macular edema; PED, pigment epithelial detachment; PVD, posterior vitreous detachment; VMA, vitreomacular adhesion.
Figure 4.
 
OCT represented within the Rasti dataset, including normal, PVD, VMA, ERM, ME, PED, and GA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; GA, geographic atrophy; OCT, optical coherence tomography; ME, macular edema; PED, pigment epithelial detachment; PVD, posterior vitreous detachment; VMA, vitreomacular adhesion.
Figure 5.
 
OCT represented within the Kafieh dataset, including normal, VMA, and ERM. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; OCT, optical coherence tomography; VMA, vitreomacular adhesion.
Figure 5.
 
OCT represented within the Kafieh dataset, including normal, VMA, and ERM. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. ERM, epiretinal membrane; OCT, optical coherence tomography; VMA, vitreomacular adhesion.
Figure 6.
 
OCT represented within the Tian dataset, including normal and VMA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. OCT, optical coherence tomography; VMA, vitreomacular adhesion.
Figure 6.
 
OCT represented within the Tian dataset, including normal and VMA. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. OCT, optical coherence tomography; VMA, vitreomacular adhesion.
Figure 7.
 
OCT represented within the Stankiewicz dataset, including normal, VMA, and VMT. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. OCT, optical coherence tomography; VMA, vitreomacular adhesion; VMT, vitreomacular traction.
Figure 7.
 
OCT represented within the Stankiewicz dataset, including normal, VMA, and VMT. The left column depicts an OCT b-scan, the middle column depicts the corresponding ground-truth label, and the right column depicts the prediction using the trained model ensemble. OCT, optical coherence tomography; VMA, vitreomacular adhesion; VMT, vitreomacular traction.
Figure 8.
 
Normalized confusion tables depict pixel-level accuracy of the trained model ensemble in predicting segmentation labels within the external validation datasets. Each square within this table represents the proportion of the cases in each row (true labels) that have been predicted as the label corresponding with that column (predictions). For each row, the diagonal values represent the sensitivity for each label and the sum of all values per row, excluding the diagonal element, represents the false negative rate for that label. Labels that did not have any instances within a specific dataset were excluded from that dataset's confusion matrix. The datasets featured include (a) the Kafieh dataset, (b) the Tian dataset, (c) the Rasti dataset, and (d) the Stankiewicz dataset. ART, artifact; CHO, choroid and sclera; ERM, epiretinal membrane; FLU, intra-/sub-retinal fluid; HTD, hypertransmission defect; HRM, hyper-reflective material; HYA, hyaloid membrane; SES, sub-epiretinal membrane space; SRM, subretinal material; RET, retina; RHS, retrohyaloid space; RPE, retinal pigment epithelium VIT, vitreous.
Figure 8.
 
Normalized confusion tables depict pixel-level accuracy of the trained model ensemble in predicting segmentation labels within the external validation datasets. Each square within this table represents the proportion of the cases in each row (true labels) that have been predicted as the label corresponding with that column (predictions). For each row, the diagonal values represent the sensitivity for each label and the sum of all values per row, excluding the diagonal element, represents the false negative rate for that label. Labels that did not have any instances within a specific dataset were excluded from that dataset's confusion matrix. The datasets featured include (a) the Kafieh dataset, (b) the Tian dataset, (c) the Rasti dataset, and (d) the Stankiewicz dataset. ART, artifact; CHO, choroid and sclera; ERM, epiretinal membrane; FLU, intra-/sub-retinal fluid; HTD, hypertransmission defect; HRM, hyper-reflective material; HYA, hyaloid membrane; SES, sub-epiretinal membrane space; SRM, subretinal material; RET, retina; RHS, retrohyaloid space; RPE, retinal pigment epithelium VIT, vitreous.
Table 1.
 
Summary of Model Dice Scores on the Various Test OCT Datasets Across the Label Categories
Table 1.
 
Summary of Model Dice Scores on the Various Test OCT Datasets Across the Label Categories
Table 2.
 
Description of Public Datasets Utilized for External Validation
Table 2.
 
Description of Public Datasets Utilized for External Validation
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×