Abstract
Purpose :
Adjudication among experienced readers is a recognized reference standard for resolving divergent interpretations in diabetic retinopathy (DR) diagnosis. Traditional adjudication involves in-person discussions; but limited expert availability and high coordination costs make this challenging. We present and evaluate an "asynchronous" system for adjudicating images that is suitable for remote grading and removes the need for in-person sessions.
Methods :
Our system allows readers to review images remotely and asynchronously (at different times). In case of disagreements, images and their previously assigned grades and comments are reviewed by all the readers in a round-robin fashion until all disagreements are resolved. In this study, we allowed the discussions to continue for up to 5 reviews per reader per image.
To further investigate methods of reducing repeated reviews per image, we tested a grading rubric for DR in which readers would assess the presence of specific features (microaneurysms, hemorrhages, neovascularization, etc) in addition to overall DR severity. 5 panels of 3 retina specialists each adjudicated a set of 499 retinal fundus images, once in person, twice asynchronously using the feature-based rubric, and twice asynchronously assessing DR severity alone.
Results :
The adjudicated grades showed high agreement with the reference standard of in-person adjudication, with Cohen’s kappa scores of 0.920 and 0.963 for the 2 panels using the feature-based rubric, and 0.941 and 0.949 for the 2 panels without the feature-based rubric.
Cases adjudicated using the feature-based rubric were resolved more quickly (fewer rounds) compared to assessing DR severity alone (p < 0.001; paired t-test per image and panel pair). During independent grading, readers were in agreement for 71% of all cases using the feature-based rubric, compared to 66% without. Using the feature-based rubric, only 3% of the cases required more than one full round of reviews, compared to 10% of the cases in the absence of the feature-based rubric (Fig 1).
Conclusions :
Asynchronous, remote adjudication presents a flexible and reliable alternative to in-person adjudication for DR diagnosis. Feature-based rubrics can help accelerate consensus formation for asynchronous adjudication of DR without compromising label quality.
This abstract was presented at the 2019 ARVO Annual Meeting, held in Vancouver, Canada, April 28 - May 2, 2019.