Abstract
Purpose :
Deep learning provides a powerful approach to analyze surgical videos and assess surgical skills objectively. We aim to build a model that automatically identifies the locations of cataract surgical tools and eye landmarks, which can be used to grade surgical performance.
Methods :
We sampled 1156 frames from 9 core steps of 268 cataract surgical videos, and annotated the regions of 8 different surgical tools, and the pupil border and limbus. We pretrained a real-time object detection and segmentation model called YOLACT on the CaDIS dataset, a public dataset for semantic segmentation of cataract surgical videos. The pretrained model was fine-tuned on our dataset. Object detection was evaluated by average precision score (AP), calculated by averaging the precision of the bounding boxes along the precision-recall curve, and segmentation was evaluated by intersection-over-union (IoU), calculated as the intersection of the predicted mask and the true mask over their union. Tooltip positions were estimated by identifying the edge point of the predicted mask closest to the screen center. Pupil centers were estimated by fitting an ellipse to the outer edges of the pupil mask and localizing the ellipse center. For further validation, the tip position estimation was compared with the ground truth positions of the tips from 46620 frames of 4 phacoemulsification video clips.
Results :
The mean AP and IoU across different classes of objects were 0.78 and 0.82, respectively. The segmentation performed the best for the blade, weck sponge, and phaco instruments, whereas performance in the needle or cannula class of instruments was the worst (Table). The average deviation of estimated phaco tip positions from ground-truth positions was 6.13 pixels. Examples are shown in Figure. When considering predictions within 10 pixels from the true position as true positives, the average sensitivities and precisions were 81% and 100%, respectively.
Conclusions :
We trained a deep learning model to perform real-time surgical instrument and tooltip detection with good accuracy. The model could be used to develop an automated feedback system that rates surgical performance using cataract surgical videos.
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.