The average number of hours per week spent in activities in the third grade (age, 8–9 years) was estimated for both future myopes and nonmyopes, and statistically significant mean differences between the two groups were identified through t-tests. The relationship between the number of myopic parents and the onset of myopia was assessed with a χ2 test. All near activity and parental history variables were used as predictors in univariate logistic regression to estimate the odds ratios for future myopia. The number of myopic parents was modeled as a discrete variable of no, one, or two myopic parents. Multiple logistic regressions were performed for statistically significant variables from the univariate models. Optimally, relative risk would be used to characterize the risk of myopia, but given the logistic analysis used in the study, we present odds ratios (OR) with accompanying 95% confidence intervals (CI).
Based on our prior findings and issues surrounding model assumptions, we chose logistic regression to build predictive models. The assumptions for the logistic model are met, whereas the categorical nature of the parental history variables made the other models (canonical discriminant and quadratic discriminant analysis) incompatible. As in our previous paper, we used the receiver operating characteristic (ROC) curves associated with the logistic models as measures of the models’ predictive ability. The area under an ROC curve (θ) is the probability that, for a randomly selected pair of future-myopic and remained-nonmyopic individuals, the predictive model correctly ranks the individuals in terms of their likelihood of future disease.
26 27 For example, suppose that a higher score from a predictive model means that a person is more likely to become myopic. If
x is the value of the predictive model for a child who remains nonmyopic and
y is the value of the predictive model for a future myopic child, then the area under the ROC curve associated with the model is an estimate of the chance that
x <
y. If the area under the curve is 0.75—for example, then a randomly selected individual from the remained nonmyopic group has a predictive test value,
x, that is smaller than the value,
y, of the future myopic individual 75% of the time, on average. The area under the empiric ROC curve plot is an unbiased estimate of
P(
x <
y), which equates to the Mann-Whitney version of the two-sample rank-sum statistic of Wilcoxon.
28 29 The area under the curve is compared statistically to 0.50, which is the value representing chance discrimination between the myopic and nonmyopic individuals. Multiple comparisons with the best methods were applied based on the work of Hsu
30 to compare each method of prediction with the best of the other models of prediction using the area under the curve.