Abstract
Purpose :
One of the limiting factors to the broad adoption of medical computer vision is the high cost of annotation. In this work, we introduce the Expert In The Loop (EITL) framework to reduce annotator time requirements with minimal sacrifices to deep model performance.
Methods :
We built a cloud native platform based on CVAT (an open source image annotation software) that allows clinicians (i.e. the expert) to annotate medical images (step 1). Upon annotation of subsets of images, a crude deep learning model can be trained. This model can be used to run inference on the unlabeled images (step 2). We then ask the expert to accept or reject the network predictions (step 3), then we retrain our network on our original expert-labeled images and the accepted network-labeled images (step 4). We can repeat steps 2-4 as needed until its accuracy reaches a suitable level.
To validate this pipeline we decided to tackle Optic Disc (OD) segmentation and Vessel Segmentation on Fundus photographs. We use the publicly available Drishti-GS dataset for OD segmentation and High-Resolution Fundus (HRF) dataset for the vessel segmentation. For training, we use supplied ground truth annotation for the baseline network but manually label two images for the EITL experiments. Labeling time for the full dataset has been extrapolated from the annotation time. All reported IoU values are on the Drishti-GS test set and HRF image index 11-15 (fifteen total images for normal, DR, and glaucoma) for each respective dataset. To expedite labeling, the expert in this case is not a clinician but a computer science researcher. We train on a standard U-Net with a combination of cross-entropy and IoU loss.
Results :
A full list of network performance can be found in Table 1. For OD segmentation and vessel segmentation, annotating the two images took 79 seconds and saved the expert 32 minutes of time, and 52 minutes, saving the expert over 12 hours, respectively. We see that our network was able to match the performance of an exhaustively annotated dataset after only collecting labels for 2 images.
Conclusions :
This study shows that we can employ expert time more efficiently in medical computer vision. By keeping an Expert In The Loop while training we can expand our datasets across more domains while not sacrificing model performance. In the future, we hope to expand our experiments to include model performance where the expert also annotates the most problematic cases.
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.