The sensitivity, specificity, accuracy, and area under the receiver operating curve (AUC) were calculated for the CAD system. Accuracy was defined as true-positives plus true-negatives divided by the total number of cases. The attending physician's final diagnosis by clinical funduscopic exam was considered the gold standard diagnosis and used to calculate the efficacy of the proposed system. The system was tested using a leave-one-subject-out (LOSO) method. This involves training the system on images from
n-1 eyes and then testing it on the images of the sole eye left out, hence LOSO. This process then is repeated
n times. It also was tested by 2- and 4-fold cross-validation. In 2-fold cross-validation, each fold contained 40 subjects (20 normal, and 10 subclinical and 10 mild/moderate DR subjects). First, one fold was used for training and one for validation. This operation was repeated several times by changing the validation fold each time to evaluate the accuracy. In 4-fold cross-validation, each fold contained 20 subjects (10 normal, and five subclinical and five mild/moderate DR subjects). First, three folds were used for training and one fold for validation. This operation also was repeated several times by changing the validation fold each time to evaluate the accuracy. Then, 95% confidence intervals (CI) were calculated using the bootstrapping technique.
15 To evaluate its accuracy, the proposed CAD system was compared to three established systems, or classifiers, based on machine learning. These are state-of-the-art classifiers available in the public domain that can serve as a benchmark for other novel classifiers, such as the system described herein. The three systems used for comparison were from the Weka collection
16 from the University of Waikato (New Zealand); K*, k-nearest neighbor (kNN); and Random forest (RF). A Dice (Sørensen-Dice) similarity coefficient, a measure of the similarity of two sample sets, was used to compare the system's segmentation performance to expert segmentation.