A computerized database was established to facilitate data management and statistical analysis. Subsequent data analysis was performed with commercial software (Excel 2007; Microsoft Corp., Redmond, WA, and SPSS 12.0; SPSS, Inc., Chicago, IL, and SAS 9.1.3, SAS Institute Inc., Cary, NC). Two-sided P ≤ 0.05 was considered statistically significant. Two-sample t-tests were used for comparing mean values of continuous variables between the study group and the age-matched control group. The χ2 test or Fisher's exact test was used to examine the associations between categorical variables. The comparisons of refractive errors and optical components among the three refractive groups—myopia, emmetropia, and hyperopia—were accomplished by using one-way analysis of variance (ANOVA).
A multivariate statistical analysis was performed to identify factors that predicted the severity of myopia, hyperopia, and astigmatism. The factors analyzed included patients' neonatal history (GA, BW, ROP stage, and major neurologic and circulatory diseases) and ocular components (optical components, visual acuity, and eye alignment). A marginal linear regression model was used, with general estimating equations (GEEs) to control for between-eye correlations. In the GEE analysis, if the exchangeable correlation structure fit our clustered data well, the model-based estimates of SE were used; otherwise, the empiric (robust) estimates of SE were reported instead, assuming that the sample size of 108 subjects was large enough. If the observations did not correlate, the GEE analysis would revert to the standard regression analysis.
The goal of the regression analysis was to find one or a few parsimonious regression models that fit the observed data well for outcome prediction or effect estimation. To ensure the quality of the results of the analysis, we performed basic model-fitting techniques for variable selection, goodness-of-fit (GOF) assessment, and regression diagnostics. Specifically, the stepwise variable selection procedure (with iterations between the forward and backward steps) was applied to obtain the candidate final regression model. All the relevant variables, regardless of the significance during the univariate analyses, were included in the variable list to be selected, and the significance level for entry (SLE) and for stay (SLS) was set at 0.15 or larger. Then, with the aid of this substantive knowledge, the best final regression model was identified manually in a backward fashion (i.e., removing one statistically nonsignificant covariate at a time) by reducing the significance level to 0.05 corresponding to the chosen α = 0.05 level. Any discrepancy between the results of univariate and multivariate analysis was probably due to the confounding effects of the uncontrolled covariates in the univariate analysis. The coefficient of determination, R 2, which is the square of the Pearson correlation between the observed and predicted values of the continuous response variable, was used to assess the GOF of the fitted linear regression model. The statistical tools for regression diagnostics such as residual analysis, detection of influential cases, and examination on multicollinearity were used to discover any problems within the data or model.