June 2021
Volume 62, Issue 8
Open Access
ARVO Annual Meeting Abstract  |   June 2021
PhacoTrainer: Deep Learning for Activity Recognition in Cataract Surgical Videos
Author Affiliations & Notes
  • Hsu-Hang Yeh
    Biomedical Data Science, Stanford University School of Medicine, Stanford, California, United States
  • Anjal Jain
    Stanford University, Stanford, California, United States
  • Olivia Fox
    Stanford University, Stanford, California, United States
  • Sophia Y Wang
    Ophthalmology, Stanford University, Stanford, California, United States
  • Footnotes
    Commercial Relationships   Hsu-Hang Yeh, None; Anjal Jain, None; Olivia Fox, None; Sophia Wang, None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2021, Vol.62, 583. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hsu-Hang Yeh, Anjal Jain, Olivia Fox, Sophia Y Wang; PhacoTrainer: Deep Learning for Activity Recognition in Cataract Surgical Videos. Invest. Ophthalmol. Vis. Sci. 2021;62(8):583.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Purpose : The use of deep learning in surgical training is promising but applications in ophthalmology are scant. The purpose of this study was to train a deep neural network to recognize cataract surgical steps, including routine and complex steps such as use of trypan blue or iris expansion devices.

Methods : We collected 268 resident cataract surgical videos routinely recorded during the residency training of 12 surgeons across 6 sites. Videos were downsampled and cropped to 256x256 at 1 frame/second. Trained annotators labeled 13 steps of surgery: create wound, injection into the eye, capsulorrhexis, hydrodissection, phacoemulsification, irrigation/aspiration, place lens, remove viscoelastic, close wound, stain with trypan blue, manipulating iris (e.g. malyugin ring/iris hooks), subconjunctival/SubTenon's injections, and other (e.g. anterior vitrectomy, placement of capsular support devices). A deep learning model based on the VGG16 architecture was customized and trained to predict the class probabilities that each frame depicted. The model was evaluated on a held-out test set using frame-by-frame top-N accuracy, defined as the proportion of frames where the true class was among the highest N predicted class probabilities. Per-class and micro-averaged area under receiver-operating and precision-recall curves (AUROC, AUPRC) were determined. To evaluate which frame areas were most important for model predictions, class activation maps were visualized using gradient-weighted class activation mapping.

Results : Overall top-1 prediction accuracy was 77.4% (93.2% for top-3 accuracy). The overall AUROC was 0.97 and the AUPRC was 0.85. Evaluation of class activation maps revealed the model was appropriately focused on the instrumentation used in each step to predict. Challenges remain in prediction of rare steps or steps with diverse appearances, including subconjunctival/subTenon's injections, iris manipulation, anterior vitrectomy, for which prediction had poor recall.

Conclusions : Deep learning models can classify cataract surgical activities on a frame-by-frame basis with remarkably high accuracy, especially routine surgical steps. An automated system for recognition of cataract surgical steps could have broad applications, including providing automated feedback metrics to residents on their surgical videos.

This is a 2021 ARVO Annual Meeting abstract.


Per-class receiver-operating curves with area under curves.

Per-class receiver-operating curves with area under curves.


Per-class gradient-weighted class activation mapping

Per-class gradient-weighted class activation mapping


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.