Purchase this article with an account.
Hsu-Hang Yeh, Anjal Jain, Olivia Fox, Sophia Y Wang; PhacoTrainer: Deep Learning for Activity Recognition in Cataract Surgical Videos. Invest. Ophthalmol. Vis. Sci. 2021;62(8):583.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
The use of deep learning in surgical training is promising but applications in ophthalmology are scant. The purpose of this study was to train a deep neural network to recognize cataract surgical steps, including routine and complex steps such as use of trypan blue or iris expansion devices.
We collected 268 resident cataract surgical videos routinely recorded during the residency training of 12 surgeons across 6 sites. Videos were downsampled and cropped to 256x256 at 1 frame/second. Trained annotators labeled 13 steps of surgery: create wound, injection into the eye, capsulorrhexis, hydrodissection, phacoemulsification, irrigation/aspiration, place lens, remove viscoelastic, close wound, stain with trypan blue, manipulating iris (e.g. malyugin ring/iris hooks), subconjunctival/SubTenon's injections, and other (e.g. anterior vitrectomy, placement of capsular support devices). A deep learning model based on the VGG16 architecture was customized and trained to predict the class probabilities that each frame depicted. The model was evaluated on a held-out test set using frame-by-frame top-N accuracy, defined as the proportion of frames where the true class was among the highest N predicted class probabilities. Per-class and micro-averaged area under receiver-operating and precision-recall curves (AUROC, AUPRC) were determined. To evaluate which frame areas were most important for model predictions, class activation maps were visualized using gradient-weighted class activation mapping.
Overall top-1 prediction accuracy was 77.4% (93.2% for top-3 accuracy). The overall AUROC was 0.97 and the AUPRC was 0.85. Evaluation of class activation maps revealed the model was appropriately focused on the instrumentation used in each step to predict. Challenges remain in prediction of rare steps or steps with diverse appearances, including subconjunctival/subTenon's injections, iris manipulation, anterior vitrectomy, for which prediction had poor recall.
Deep learning models can classify cataract surgical activities on a frame-by-frame basis with remarkably high accuracy, especially routine surgical steps. An automated system for recognition of cataract surgical steps could have broad applications, including providing automated feedback metrics to residents on their surgical videos.
This is a 2021 ARVO Annual Meeting abstract.
Per-class receiver-operating curves with area under curves.
Per-class gradient-weighted class activation mapping
This PDF is available to Subscribers Only