We appreciate the interest in our article “An Analysis of the Use of Multiple Comparison Corrections in Ophthalmology Research.”
1 The comments presented by Huisingh and McGwin
2 provide a sound overview of the theory behind multiple comparison corrections, and compliment the historical and theoretical discussion presented in our study. We agree with the majority of their points; correcting for multiple comparisons is a multifaceted discussion without uniform acceptance that should be considered in studies involving multiple statistical comparisons.
Huisingh and McGwin suggest that we “define abstracts reporting five or more
P values as those needing a correction factor.”
2 However, in our report we actually champion the opposite viewpoint, that those not involved in a study's design (e.g., we, the post-hoc researchers) can never determine a posteriori if a correction factor was needed. As we mentioned in our text, studies reporting five or more
P values have a Family-wise Error Rate (FWER) greater than 20% and were chosen as “the best available sample of studies where a correction factor would need to be considered.”
1 It is not uncommon to produce multiple
P values and not require a correction factor, as can be seen in our own research.
3,4 Nevertheless, if a study has numerous statistical comparisons and researchers determine that a correction factor is not needed, this does not mean that the inflated FWER does not exist, it only means that the inflation is acceptable. Furthermore, the fact that not all studies with multiple statistical comparisons require the use of a correction factor does not release researchers from the obligation to determine, a priori, if a correction factor is required.
We acknowledge the limitations of our study mentioned in the letter, and have highlighted and discussed each one in great detail in our original study. The multiple comparisons problem is just one example of the limitations of
P values in medical and ophthalmic research.
5 P values are neither infallible nor perfectly reliable, and always must be used with a sound knowledge of their limitations. Unfortunately, they frequently are interpreted as the opposite: fixed thresholds that unfailingly define the limits of clinical efficacy. Though not perfect,
P values often are the best option available to make statistical inference. Therefore, recognizing the fallacies of
P values and the trust placed in them by clinicians, it is imperative that researchers do their part in addressing the multiple comparisons problem during study design.