The one-way analysis of variance can be used to compare two or more means. Assume that there are k groups (for our illustration, k = 3) with observations yij for i = 1, 2, …, k and j = 1, 2, …, ni (number of observations in the ith group). The ANOVA table partitions the sum of squared deviations of the \(n = \sum\nolimits_{i = 1}^k {{n_i}} \) observations from their overall mean, \(\bar y\), into two components: the between-group (or treatment) sum of squares, \(SSB = \sum\nolimits_{i = 1}^k {{n_i}({{\bar y}_i}} - \bar y{)^2}\), expressing the variability of the group means \({\bar y_i}\) from the overall mean \(\bar y\), and the within-group (or residual) sum of squares, \(SSW = \sum\nolimits_{i = 1}^k {\{ {\sum\nolimits_{j = 1}^{{n_{i}}} {({y_{ij}} - {{\bar y}_i}} {)^2}} \}} = \sum\nolimits_{i = 1}^k {({n_i} - 1)s_i^2} \), adding up all within-group variances, \(s_i^2\). The ratio of the resulting mean squares (where mean squares are obtained by dividing sums of squares by their degrees of freedom), \(F = \frac{{SSB/(k - 1)}}{{SSW/(n - k)}}\), serves as the statistic for testing the null hypothesis that all group means are equal. The probability value for testing this hypothesis can be obtained from the F-distribution. Small probability values (smaller than 0.05 or 0.10) indicate that the null hypothesis should be rejected.
The ANOVA assumes that all measurements are independent. This is the case here, as we have different subjects in the three groups. Note that independence could not be assumed if both right and left eyes were included, as right and left eye observations from the same subject are most likely correlated; we will discuss later how to handle this situation.
The ANOVA assumes that the variances of the treatment groups are the same. Its conclusions may be misleading if the variances are different. Box
3 showed that the
F-test is sensitive to violations of the equal variance assumption, especially if the sample sizes in the groups are different. The
F-test is less affected by unequal variances if the sample sizes are equal. Although the
F-test assumes normality, it is robust to non-normality as long as the sample sizes are reasonably large (e.g., 30 samples per group).
For only two treatment groups, the ANOVA approach reduces to the two-sample t-test that uses the pooled variance. Earlier we had recommended the Welch approximation, which uses a different standard error calculation for the difference of two sample means, as it does not assume equal variances. Useful tests for the equality of variances are discussed later.
If the null hypothesis of equal group means is rejected when there are more than two treatment groups, then follow-up tests are needed to determine which of the treatment groups differ from the others using pairwise comparisons. For three groups, one calculates three pairwise (multiple) comparisons and three confidence intervals for each pairwise difference of two means. The significance level of individual pairwise tests needs to be adjusted for the number of comparisons being made. Under the null hypothesis of no treatment effects, we set the error that one or more of these multiple pairwise comparisons are falsely significant at a given significance level, such as α = 0.05. To achieve this, one must lengthen individual confidence intervals and increase individual probability values. This is exactly what the Tukey multiple comparison procedure
4 does (
Table 2,
Fig. 1). Many other multiple comparison procedures are available (Bonferroni, Scheffe, Sidak, Holm, Dunnett, Benjamini–Hochberg), but their discussion would go beyond this introduction. For a discussion of the general statistical theory of multiple comparisons, see Hsu.
5
The ANOVA results in
Table 2 show that mean retinal thickness differs significantly across the three treatment groups (
P = 0.0001). Tukey pairwise comparisons show differences between the group means of thickness for control and EAE and for control and EAE + treatment. The means of EAE and EAE + treatment are not significantly different.