Abstract
Purpose :
Deep Learning models require extensive labeled data which costs resources that strain many groups ability to produce results. The appropriate amount of data is usually stated as "more is better", while efforts to quantify the effect the number of samples have been few. We propose to vary the training size of a deep learning model and explore the changes in the final metric in an effort to quantify the effect that training sample size has on the final outcome
Methods :
Fundus autofluorescence images (FAF) from the AREDS2 dataset from eyes with Geographic Atrophy (GA) were included (n = 2016). The area of GA was recorded by a human grader. The dataset was split by participant into a training and test set of 1515/511 images. A deep learning convolutional neural network (CNN) was trained on 1515 images to predict the area of GA and tested on 511 images to give a baseline mean squared error (MSE) between the target and predicted area. The training set was then subsampled by 75, 50, 25, 10, 5 and 2 percent of the images and used to train new models at each percentage. These models were then tested on the same (N=511) test set and MSE loss compared to the baseline trained model of the full training set. Each percentage was run 10 times to capture variations due to the sampling.
Results :
The baseline MSE for a model trained on 100% of the training set images was 0.731 (0.636,0.833) 95% Confidence Interval (CI). When using 75% and 50% of the training data, the MSE loss had an average of 0.779 and 0.841 respectively over 10 repeated runs. Only when the training set was lomited to 25% of the data did the final MSE loss rise to 1.04 (0.854, 1.205) 95% CI, beyond the initial range, [0.603, 0.857]. Limiting the training set to 10, 5, and 2 percent of the data gave final MSE losses of 1.705, 2.096, and 4.231. Additionally as the models were trained, the final loss was achieved in fewer and fewer steps (batch size was fixed) to achieve the final minimum loss. Subsampling also increased the range of results.
Conclusions :
The results find that based on a N=1511 training set of images, the MSE loss did not rise beyond the range of the initial test runs until the training subset was reduced to 25% of the original. As sample size continued to be reduced an exponential rise in the final MSE loss was observed. Repeating such experiments in a multitude of datasets will help understand the true range of training sample required for AI models.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.