Purchase this article with an account.
Hasan Cetin, Jon Whitney, Duriye Damla Sevgi, Jenna Hach, Sunil Srivastava, Jamie Reese, Justis P Ehlers; Impact of Varying Dataset Composition Ratios on the Machine Learning Model Segmentation Performance for Subretinal Hyperreflective Material: A Quantitative and Qualitative Evaluation. Invest. Ophthalmol. Vis. Sci. 2021;62(8):2164.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Detection of specific features of interest on OCT is strongly linked to training data composition. For many targeted features, large comprehensive ground-truth annotated datasets are not available. Smaller datasets may be more susceptible to class imbalance potentially affecting the machine learning (ML) performance. The purpose of this study was to evaluate the impact of the variable ratios of positive to negative training data on ML performance based on quantitative and qualitative parameters on segmentation of subretinal material (SRM).
: A U-Net architecture convolutional model was executed and evaluated on training datasets with varying ratios of annotated OCT images containing (positive) and not containing (negative) SRM in neovascular age-related macular degeneration. ML performance based on 5 different ratios of positive (P) and negative (N) data: 30P-70N, 40P-60N, 50P-50N, 60P-40N, 70P-30N was assessed. The quantitative performance evaluation was calculated using F-scores. Qualitative performance evaluation was based on multiple experts’ reviews of the model outputs in a tiled configuration for assessment of optimal segmentation (Figure 1).
The results demonstrated variable model performance related to the training dataset ratio. Based on quantitative model performance, the F-scores ranged from 0.59 to 0.72. The highest performing model based on F-scare was the 70P-30N training set. However, qualitative model performance assessment demonstrated that the 30P-70N (F-score = 0.61) was the preferred training set. In qualitative review, the 70P-30N model demonstrated excellent detection of subretinal material with few false negatives, but with an excess of false positives that was clinically impactful (Figure 1). Conversely, the 30P-70N demonstrated a more conservative segmentation with dramatic reduction in false positives while maintaining minimal false negatives.
This study demonstrates the important of dataset composition and positive/negative sampling ratios in datasets of limited size. In addition, this analysis identifies the potential disconnect between qualitative/practical model performance and quantitative performance metrics.
This is a 2021 ARVO Annual Meeting abstract.
This PDF is available to Subscribers Only