Abstract
Purpose :
Currently, most retinal imaging data is poorly annotated, while accurately annotated data is required to develop and test automatic algorithms for screening. We demonstrate how to combine deep learning and crowdsourcing for fast annotation of large-scale datasets.
Methods :
Initial ground truth data contains existing annotations from small public datasets. The user is presented with a mixture of “Control Images” from the “Ground Truth” set (GTS) and some new images from the “Temporal Ground Truth” set (TGTS). For sessions with high level of agreement, images are updated in the TGTS and eventually moved to GTS. Images in the GTS with the least number of votes are moved to the TGTS to reconsider. Simulations validate the approach and accuracy of the annotations, but show slow convergence rate. To accurately annotate 1K images, more than 10K sessions are needed.
We incorporate an automatic classification algorithm into the classification scheme. We developed three different deep learning classification networks. Each time when the GTS is changed significantly the classifiers are retrained. Then all the images in the TGTS are automatically classified and the results are fed directly into the TGTS. The confidence of the classification defines the number of votes per image set in the TGTS. Classifications with very low confidence are discarded.
Results :
The accuracy of the deep learning automatic approaches is relatively high, almost reaching 90%. The accuracy of the crowd is even higher – 93%. Use of a hybrid approach results in almost 95% accuracy.
The main advantage here is not the 2% improvement, but the convergence rate. Namely, to annotate 1K images via crowdsourcing only, we need more than 10K session and with our hybrid approach only 1250 sessions are needed.
Conclusions :
We propose a framework for retinal image annotation based on crowdsourcing and deep learning. We introduce a novel approach for data validation that includes validation of the ground-truth and the annotation input.
We demonstrate that our sophisticated validation scheme successfully copes with noisy ground-truth data and with non-consistent input from crowdworkers.
Finally, we prove that incorporating deep-learning classification into our scheme both improves the accuracy and speeds up the annotation process.
This is an abstract that was submitted for the 2016 ARVO Annual Meeting, held in Seattle, Wash., May 1-5, 2016.