We evaluated whether Term, Preterm, and ROP subjects' data fit with respective curves (
Equation 1) provided a significantly better fit than all three groups' data fit with a single curve. Specifically, we tested the null hypothesis that the additional fitting accuracy (i.e., lower total sum of squared errors, SS) obtained by the use of more than one curve was not offset by the increase in the degrees of freedom (
df) afforded by more curves (i.e., “one curve for all data sets”). With individual eye data fit with each curve, this hypothesis can be tested using the formula
where
SS1 is the total sum of the squared deviations from individual points to the solitary fit,
SS2 is the total sum of the squared deviations to the three respective fits,
n is the total number of subjects in the analysis (so 2
n would be the number of eyes),
k is the number of groups being tested, and
p is the number of parameters in
Equation 1 that were allowed to freely vary (typically 3, as
y0 was usually fixed at 0).
34 The main difference between this calculation and a typical ANOVA is that the error calculations are based upon the mean square (MS) difference from the fitted group curve (
Equation 1) rather than from the group mean. However, since two eyes contributed by a subject are likely to be more similar than two eyes contributed by different subjects, the within-individual variability needed to be accounted for and, thus, we modified the
F ratio as follows:
where
SSR is the sum of the squared intraocular (i.e., repeated-measures) differences from the group mean difference to the three respective fits. The inclusion of
SSR in the modified formula (
Equation 3), means that the
F ratio now includes the ratio of the MS between treatments to the MS subject by treatments (Hays
35 formula 13.21.4). We evaluated
F with the difference between the number of parameters in the multiple and single curve scenarios as
dfnumerator (e.g., multiple curve scenario = 3 curves × 3 free parameters = 9; single curve scenario = 3 free parameters;
dfnumerator = 9 − 3 = 6), and the number of subjects less the number of groups (e.g.,
dfdenominator =
n −
k = 129 − 3 = 126). The combined use of
Equation 3 in the calculation of
F and of subjects instead of eyes in the calculation
dfdenominator results in an appropriately increased threshold for statistical significance that offsets the putatively decreased variability in the sample inherent in our repeated-measures design.
35 Where statistical significance was attained, we concluded that a “different curve for each data set” was appropriate. In those cases, we performed post hoc pairwise comparisons, following the same procedure, to detect which of the three respective curves differed from which others. We made the threshold for statistical significance more stringent (
α = 0.01) for these pairwise post hoc tests.