Abstract
Purpose :
Artificial intelligence may help introduce diabetic retinopathy (DR) screening to low-income countries where manpower and expertise are limited. Depending on the referral threshold, the volume of referral may outstrip capacity to treat. After screening is introduced disease prevalence is expected to fall. We have developed a novel AI that is trained using pairwise comparison and capable of ranking images in order of severity. Referral threshold can be adjusted to be locally appropriate. We tested the performance of our algorithm at different levels of disease prevalence.
Methods :
We developed and tested an approach to AI which we term “comparative AI”, inspired by the concept of “adaptive comparative judgment”. AI models were trained on publicly available datasets (Messidor2, APTOS, DR-2015, IDRID) using pairwise comparison to judge which of two images had more severe retinopathy. Images are ranked based on the likelihood of each image being worse or better than a specified boundary. The model was refined using Bayesian statistics. We tested 2 populations derived from the DDR dataset with different prevalence based on published literature: a ‘naive population’ with higher prevalence of more severe grades and a ‘mature’ population with lower prevalence (Table). We report on two thresholds of referral: mild vs. moderate NPDR and moderate vs. severe NPDR.
Results :
For a naive population, the sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV) and number of false positive referrals (FP) for mild v moderate were 0.84, 0.89, 63.7%, 96.1% and 107 FP cases; for moderate v severe were 0.94, 0.89, 35% and 99.7% and 123 FP cases. For the mature population: mild v moderate 0.86, 0.82, 29.8%, 98.5% and 203 FP cases; moderate v severe 1.00, 0.89, 18.3%, 100% and 125 FP cases. SEN and PPV move in opposite directions as do SPE and NPV.
Conclusions :
On introduction of a DR screening programme, case referral and treatment reduce the prevalence of severe retinopathy. Irrespective of referral threshold settings, lower disease prevalence is associated with higher false positive referrals. This is an inevitable consequence of statistics. Comparative AI offers a flexible referral threshold which could deliver acceptable PPV performance across a range of health settings and changing DR prevalence, without extensive and costly re-training and re-regulation.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.