Abstract
Purpose :
While the Ophthalmic Knowledge Assessment Program (OKAP) examination provides a uniform measure of the medical knowledge of resident physicians, there is no analogous system for assessing surgical skill. Crowdsourcing, which employs large numbers of lay people via the Internet to perform a task, represents a novel approach to standardize skill evaluations across residency programs. In a cross-sectional study using real, non-simulated cataract surgery videos, we tested the hypothesis that crowdsourced lay raters can accurately assess surgical skills as compared to experts.
Methods :
Fifty phacoemulsification video segments from 15 physicians, who ranged in training level from first year resident to attending physician, were graded by 5 blinded experts and 347 lay raters via the CrowdSourcing Assessment of Technical Skill platform. Grading was performed using a modified Objective Structured Assessment of Surgical Skills (OSATS) tool, which examines 5 skill domains including microscope centration, economy of movement, respect for tissue, flow of operation, and instrument handling. To adjust for score clustering by crowd raters, crowd mean scores were generated with a linear mixed-effects model. Crowd and expert scores were compared via correlation and t-tests using SAS (Statistical Analysis Software).
Results :
The expert scores demonstrated high interrater reliability (intraclass correlation coefficient 0.976) and accurately predicted level of training, establishing content validity for the modified OSATS. Crowd and expert scores were highly correlated (Spearman’s rho rS=0.88), but crowd scores were consistently higher than expert scores for first, second, and third year residents (p<0.0001, paired t-test). Longer surgery duration was strongly correlated with lower training level (rS=-0.88) and expert score (rS=-0.92). A regression equation to convert crowd score plus video length into expert score was derived (r-sq 0.92).
Conclusions :
This is the first study to examine the feasibility and validity of crowdsourcing evaluations of cataract surgery videos. While crowdsourced rankings of cataract surgery videos correlated with expert scores, crowd scores overestimated technical competency, especially for novice surgeons. Validation with additional datasets including longitudinal studies on the same surgeons is ongoing.
This abstract was presented at the 2019 ARVO Annual Meeting, held in Vancouver, Canada, April 28 - May 2, 2019.