Abstract
Purpose :
To compare the predictive reliability and accuracy using different types of machine learning-based models for the individual myopia prediction in a long-term follow-up cohort of Chinese children.
Methods :
Data of the first-born twins followed annually between 2006 and 2015 from the Guangzhou Twins Eye Study were analyzed. Five types of machine-learning algorithms, including a random forest model, a support vector machine (SVM), a gradient boosting decision tree (GBDT), a neural network and a multivariate adaptive regression splines (MARS) were conducted with bootstrap sampling and validation. The main outcomes were the R2 and mean absolute error (MAE) in predicting spherical equivalent (SE) at last visit, and area under curve (AUC) of the presence of high myopia (SE<-6.0D)at age of 18 years. Refraction data and age at examination were used to develop the algorithms.
Results :
A total of 1063 subjects were included, with a mean age at baseline of 10.5+2.2 years, and a prevalence of high myopia of 7.4% at last visit. For predicting refractive error at 1-year follow up, all the five models showed similar R2s (0.9446 to 0.9578), while for predicting refraction at 7 years from baseline, the random forest model showed a lower R2 than the other four models (0.7163), as well as a greater MAE (1.1044). In SVM model, an R2 of 0.7454 and a lowest MAE of 1.0161 at 7-year prediction were presented. The AUCs for predicting high myopia onset at age of 18 years was highest in MARS (0.9763) and lowest in random forest model (0.9699).
Conclusions :
We propose that the random forest model was not the optimal machine-learning method in predicting long-term refraction development in children, while SVM or MARS showed high reliability and accuracy and may provide a more precise individual risk assessment.
This is a 2020 ARVO Annual Meeting abstract.