April 2012
Volume 53, Issue 4
Free
Clinical and Epidemiologic Research  |   April 2012
An Analysis of the Use of Multiple Comparison Corrections in Ophthalmology Research
Author Affiliations & Notes
  • Andrew W. Stacey
    From the 1Department of Medical Education, Riverside Methodist Hospital, Columbus, Ohio; 2The Ohio State University College of Medicine, Columbus, Ohio;The Ohio State University College of Public Health, Columbus, Ohio; andOhio University/OhioHealth Doctors Hospital, Section Oculofacial Plastic and Reconstructive Surgery, Columbus, Ohio.
  • Severin Pouly
    From the 1Department of Medical Education, Riverside Methodist Hospital, Columbus, Ohio; 2The Ohio State University College of Medicine, Columbus, Ohio;The Ohio State University College of Public Health, Columbus, Ohio; andOhio University/OhioHealth Doctors Hospital, Section Oculofacial Plastic and Reconstructive Surgery, Columbus, Ohio.
  • Craig N. Czyz
    From the 1Department of Medical Education, Riverside Methodist Hospital, Columbus, Ohio; 2The Ohio State University College of Medicine, Columbus, Ohio;The Ohio State University College of Public Health, Columbus, Ohio; andOhio University/OhioHealth Doctors Hospital, Section Oculofacial Plastic and Reconstructive Surgery, Columbus, Ohio.
  • Corresponding author: Andrew W. Stacey, Department of Medical Education, Riverside Methodist Hospital, 3535 Olentangy River Road, Columbus, OH 43214; astacey2@ohiohealth.com
Investigative Ophthalmology & Visual Science April 2012, Vol.53, 1830-1834. doi:10.1167/iovs.11-8730
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Andrew W. Stacey, Severin Pouly, Craig N. Czyz; An Analysis of the Use of Multiple Comparison Corrections in Ophthalmology Research. Invest. Ophthalmol. Vis. Sci. 2012;53(4):1830-1834. doi: 10.1167/iovs.11-8730.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose.: The probability of type I error, or a false-positive result, increases as the number of statistical comparisons in a study increases. Statisticians have developed numerous corrections to account for the multiple comparison problem. This study discusses recent guidelines involving multiple comparison corrections, calculates the prevalence of corrections in ophthalmic research, and estimates the corresponding number of false-positive results reported at a recent international research meeting.

Methods.: The 6415 abstracts presented at ARVO 2010 were searched for statistical comparisons (P values) and for use of multiple comparison corrections. Studies that reported five or more P values while reporting no correction factor were used in a simulation study. The simulation study was conducted to estimate the number of false-positive results reported in these studies.

Results.: Overall, 36% of abstracts reported P values and 1.2% of abstracts used some form of correction. Whereas 8% of abstracts reported at least five P values, only 5% of these used a multiple comparison correction. In these highly statistical studies, simulations resulted in 185 false-positive outcomes found in 30% of abstracts.

Conclusions.: The paucity of multiple comparison corrections in ophthalmic research results in inflated type I error and may produce unwarranted shifts in clinical or surgical care. Researchers must make a conscious effort to decide if and when to use a correction factor to ensure the validity of the data.

Introduction
Inflated type I error due to multiple statistical comparisons is a well established problem in medical literature.1–3 Prior to beginning an analysis, researchers must agree on an acceptable type I error rate, or alpha level. When more than one significance test is performed in a study, the type I error rate for each individual test remains equal to the alpha level; however, the probability of obtaining at least one false-positive result in the study as a whole increases. This is known as the multiple comparison problem or the multiple testing problem. To illustrate this phenomenon, consider a standard coin flip. Each time a coin is flipped there is a 50% chance of the coin landing on “heads.” Now consider flipping the same coin 10 times. Each individual flip still results in a 50% chance of “heads.” However, the probability of obtaining at least one “head” among all 10 coin flips is much larger than 50%. The same phenomenon occurs when 10 separate tests are performed using an alpha level of 0.05; the probability of obtaining at least one false-positive result out of all 10 individual tests is larger than 5%. 
The probability of making at least one type I error in a study, referred to as the familywise error rate (FWER), is directly proportional to the number of comparisons made. If one assumes an alpha level of 0.05, then the FWER equals the following equation: FWER = 1 – 0.95 n , where n is the total number of comparisons made in a study. This equation is represented graphically in Figure 1, which demonstrates how FWER is related to the number of comparisons in a study and the predetermined alpha level. 
Figure 1.
 
FWER, the probability that at least one type 1 error will occur in a study, increases as the total number of significance tests performed within the study increases. The solid line represents the FWER at an alpha level of 0.05. The FWER is smaller when the alpha level is decreased (alpha = 0.01, dotted line) and is larger when the alpha level is increased (alpha = 0.10, dashed line).
Figure 1.
 
FWER, the probability that at least one type 1 error will occur in a study, increases as the total number of significance tests performed within the study increases. The solid line represents the FWER at an alpha level of 0.05. The FWER is smaller when the alpha level is decreased (alpha = 0.01, dotted line) and is larger when the alpha level is increased (alpha = 0.10, dashed line).
The dramatic rise in FWER with additional significance testing is a serious dilemma for the investigator. As illustrated by Figure 1, if researchers test 14 comparisons in a study with alpha 0.05, a false-positive result will be reported 50% of the time. In light of this potentially deleterious effect, statisticians have devised a number of multiple comparison corrections to account for an increasing FWER. 
If, how, and when to use multiple comparison corrections is a historically important debate in the peer-reviewed medical literature.4–12 Although there is still ongoing discussion about the more esoteric points of the argument, many researchers and international organizations agree that multiplicity corrections must be used to rein in type I error.13–24 In fact, a number of recent studies have identified the lack of multiple comparison corrections to be the underlying cause of unwarranted shifts in clinical care paradigms.13,25,26 Although reported guidelines for use vary, most sources agree that: (1) the multiple comparisons problem should not be ignored or type I error inflation can occur; (2) the best way to address the problem is to limit the number of comparisons; (3) rationale for and against using a correction factor should be discussed before data analysis is undertaken and should be properly documented; and (4) corrections are strongly encouraged when separate comparisons are related or when a study is confirmatory in nature. The difference between “exploratory” and “confirmatory” analysis, often described as inductive versus deductive research, is not finite and also must be discussed during study design. 
Recent literature has demonstrated the extensive use of statistical analysis and need for multiplicity corrections in ophthalmology research.27–29 However, an analysis of the prevalence of multiple comparison corrections in ophthalmic research and its implications has not been addressed. In this study the prevalence of multiplicity corrections in ophthalmic research is estimated using abstracts at an international research conference. The analysis focuses on studies that report large numbers of statistical comparisons, because these represent research where multiple comparison corrections would need to be considered. Simulation techniques are used to estimate the number of type I errors reported in these statistically rigorous abstracts. 
Methods
In the spring of each year The Association for Research in Vision and Ophthalmology (ARVO) conducts an international research meeting, bringing together researchers in all fields of ophthalmology. Research presentations at ARVO are delivered using both oral and poster methods. Each presentation is submitted in abstract form and is peer-reviewed prior to acceptance. At the conclusion of the meetings, ARVO publishes the abstracts for every poster and oral presentation online in portable document format (PDF). At the time of this study, the most recent abstracts available through ARVO online were from the meeting held in May 2010. Presentations at ARVO 2010 were divided into 16 subspecialty categories (Table 1). 
Table 1.
 
The Prevalence of P Values Reported at ARVO 2010, by Category
Table 1.
 
The Prevalence of P Values Reported at ARVO 2010, by Category
Total # of Abstracts # Reporting P Values % Reporting P Values Max # P Values Reported Median # of P Values (where reported) # of Abstracts Reporting >5 P Values # of Abstracts Reporting >10 P Values
Anatomy 225 68 30% 50 2 12 2
Biochemistry 562 124 22% 20 3 33 10
Clinical epidemiology 348 169 49% 14 3 49 5
Cornea 878 321 37% 1,000,000 3 56 8
Eye movements 272 97 36% 29 3 19 6
Genetics 48 6 13% 100,000 2.5 2 2
Glaucoma 752 459 61% 100 3 135 31
Immunology 362 88 24% 12 2 11 2
Lens 240 46 19% 12 3 11 2
Multidisciplinary 177 51 29% 21 2 10 1
Nanotechnology 25 6 24% 2 1 0 0
Physiology 312 112 36% 24 3 25 6
Retina 1076 463 43% 20 2 112 14
Retinal cell biology 600 163 27% 100,000 2 27 2
Visual neurology 265 48 18% 10 3 8 1
Visual psychology 273 100 37% 9 2.5 28 0
Total 6415 2321 36% 1,000,000 3 538 92
Every abstract presented at ARVO 2010 was downloaded in PDF and searched for P values. The PDF document was searched for the terms, “P value,” “P ,” “P,” “P,” and all spatial variations of the same. All abstracts were also searched for the most common multiple comparison correction methods using the terms “Bonferroni,” “Scheffe,” “Tukey,” “Duncan,” “Dunnett,” “Newman-Keuls,” “Sidak,” “Least Significant Difference,” “False Discovery Rate,” as well as the general terms “multiple comparison” and “multiplicity.” The search was automated, highlighting all the terms listed above. After the automated search was complete, two of the authors (AS and SP) and two assistants conducted a manual review of the search results, assessed the results for validity, and recorded two variables for each abstract: the number of reported P values and whether a correction factor was used. 
Studies that reported considerable statistical output, in the form of 5 or more reported P values (FWER of 23% or greater) and 10 or more P values (FWER of 40% or greater), were analyzed for their use of a correction factor. If a correction factor was not mentioned, the abstracts were used in a simulation study. The goal of the simulation study was to estimate the number of type I errors expected in these statistically rigorous studies. Criteria for inclusion in the simulation were 5 or more reported P values and no reported correction factor. For each abstract that met inclusion criteria, a binomial distribution was used to simulate the number of type I errors reported in the abstract using the number of reported P values as the “number of observations” parameter and an assumed alpha level of 0.05 as the “success” parameter. The simulation parameters can be written as Yi ∼ BINOMIAL(ni, p), where ni is the number of reported P values in the ith abstract, P equals the alpha level (0.05) or the probability of type I error, and Yi is the resulting number of simulated type I errors in the ith abstract. Because the null hypothesis was unknown in all cases, it was assumed to be true for all statistical comparisons. One simulation was complete when the resulting number of type I errors for each abstract was estimated using the above distribution. At the end of one simulation, results were recorded including: the total number of type I errors in all studies, the number of simulated studies with type I errors, and the number of simulated studies with more than one type I error. This process was repeated 10,000 times and the average results were calculated. A separate simulation study was carried out for all abstracts with 5 or more P values, and all abstracts with 10 or more P values. The simulation study was completed using the R software (GNU Project) statistical package (provided in the public domain by the R Foundation for Statistical Computing, Vienna, Austria, available at http://www.r-project.org/).30  
Results
A total of 6415 abstracts in 16 categories were presented at ARVO in 2010. Table 1 reports the overall prevalence of P values in these abstracts, separated by category. A total of 36% of all abstracts (2321) reported statistical comparisons in the form of P values. Researchers in glaucoma registered the highest percentage of abstracts reporting P values (61%), whereas those in genetics reported the lowest percentage (13%). Overall, 23% (538) of those abstracts that reported P values presented >5 and 4% (92) reported >10 P values. 
Table 2 summarizes the prevalence of multiple comparison corrections. A total of 74 abstracts mentioned some form of correction. This represents 1.2% of all abstracts and 3.2% of those abstracts that reported the use of P values. The most common correction method used was a Bonferroni correction, which represented 32% of all corrections. Researchers also used Tukey's (28%), False Discovery Rate (7%), Least Significant Difference (5%), Dunnett's (4%), Scheffe's (3%), and Newman-Keul's (3%) methods. A nonspecific multiple comparison test was used in 13 (18%) abstracts. The Duncan or Sidak methods were not used. The abstracts within the genetics section, which reported the lowest prevalence of P values, demonstrated the most proficient use of multiple comparison corrections with 8.3% of all genetics abstracts reporting some form of correction. Of the abstracts that reported at least 5 P values, only 5% (27 of 538) reported a correction factor. In the 511 abstracts with at least 5 P values and no correction factor, there were a total of 3703 reported P values (per-abstract mean = 7.2, median = 6, max = 44). Of the abstracts that reported at least 10 P values, only 13% (12 of 92) reported a correction factor. In the 80 abstracts with at least 10 P values and no correction factor, there were a total of 1054 reported P values (per-abstract mean = 13.2, median = 11, max = 44). 
Table 2.
 
Analysis of the Prevalence of Multiple Comparison Corrections in Ophthalmic Research Presented at ARVO 2010
Table 2.
 
Analysis of the Prevalence of Multiple Comparison Corrections in Ophthalmic Research Presented at ARVO 2010
Bonferroni Tukey False Discovery Rate Least Significant Difference Dunnett Scheffe Newman- Keuls Multiple Comparison NOS Total % of All Abstracts % of All Abstracts Reporting P Values
Anatomy - - 2 - - - - 2 4 1.8% 5.9%
Biochemistry 2 - 1 - - - 1 1 5 0.9% 4%
Clinical epidemiology - 1 - - - - - - 1 0.3% 0.6%
Cornea 7 1 - 2 - 1 - 4 15 1.7% 4.7%
Eye movements 1 3 - - - - - 1 5 1.8% 5.2%
Genetics 3 - - - - - - 1 4 8.3% 66.7%
Glaucoma 4 1 1 2 1 1 1 3 14 1.9% 3.1%
Immunology - - - - - - - 1 1 0.3% 1.1%
Lens - 3 - - - - - - 3 1.3% 6.5%
Multidisciplinary 1 - - - - - - - 1 0.6% 2%
Nanotechnology - - - - - - - - 0 0% 0%
Physiology - 2 - - 1 - - - 3 1% 2.7%
Retina 3 4 - - - - - - 7 0.7% 1.5%
Retinal cell biology 1 1 - - - - - - 2 0.3% 1.2%
Visual neurology - 1 1 - 1 - - - 3 1.1% 6.3%
Visual psychology 2 4 - - - - - - 6 2.2% 6%
Total 24 21 5 4 3 2 2 13 74 1.2% 3.2%
A total of 511 abstracts met inclusion criteria for the simulation study involving abstracts with 5 or more P values. A total of 80 abstracts met criteria for the simulation study involving 10 or more P values. The characteristics of the studies that met inclusion criteria and the results of the simulation study are displayed in Table 3. The simulation study resulted in a false-positive outcome in an average of 30% (154 of 511) of abstracts reporting 5 or more P values and in nearly half (48%, 38 of 80) of abstracts reporting 10 or more P values. In addition, multiple type I errors were found in an average of 5.2% of studies with 5 or more comparisons and 14% of studies with 10 or more. 
Table 3.
 
Simulation Characteristics and Results
Table 3.
 
Simulation Characteristics and Results
Abstract Characteristics Simulation Results
# of Reported P Values # Abstracts Meeting Criteria Total # of P Values in Included Studies Average # of Simulated Type I Errors Average # of Simulated Studies with a Type I Error % of Simulated Studies with a Type I Error % of Studies with Multiple Type I Errors
5 or more 511 3703 185.3 154.2 30.20% 5.20%
10 or more 80 1054 52.7 38.2 47.70% 14.00%
Discussion
Although the need for multiple comparison corrections has been understood for many years, its application in medical and ophthalmic research has been slow to follow. Indeed, in this analysis a correction factor was used in only 1.2% of all studies and in only 5% of statistically rigorous studies. Many of the uncorrected P values were likely part of exploratory analyses where an inflated FWER is acceptable, a fact that would need to be addressed a priori and cannot be discerned by any a posteriori analysis. Nevertheless, 1.2% is a very low estimation, especially when compared with other literature reviews where the number of studies reporting correction factors is often 40% and as high as 60%.31  
Although the simulated percentage of statistically rigorous abstracts reporting a type I error is already large (30%), it should be noted that this is likely an underestimate. Due to a process called publication bias, the number of reported P values is expected to be an underestimate of the total number of statistical comparisons conducted.11 Inevitably, there are numerous statistical comparisons that were conducted but not reported, whether due to a nonsignificant result, space constraints, or other reasons. An evidence of this fact is that a small number of abstracts reported the use of a correction factor but reported no P values. The results of this and any similar analysis, therefore, will underestimate the number of statistical comparisons conducted by researchers, which leads to an underestimate of the type I error rates and an underestimate of the need for multiple comparison corrections. 
Although it is tempting to determine, a posteriori, which studies required a correction factor, it is well understood that there can be no steadfast rules on when corrections methods are required.13,23 Any retrospective analysis of studies without a priori knowledge of inherent correlations in statistical comparisons or knowledge of whether the study is “exploratory” versus “confirmatory” in nature would lead to arbitrary results, at best. Additionally, it should be noted that if a study has numerous, unrelated comparisons that are “exploratory” in nature, this does not decrease the FWER of the study; it only makes an elevated type I error rate more acceptable. For example, suppose a study uses one data set to conduct 10 related comparisons, whereas another study uses 10 different data sets to conduct 10 different comparisons. Both studies conduct 10 comparisons, which at an alpha level of 0.05 results in a type I error rate (FWER) of 40%. Although the elevated type I error rate is less desirable in the study that uses one data set because the comparisons are related, the probability of type I error remains identical between the two studies. The approach used in this analysis of identifying statistically rigorous studies with high FWER provides the best available sample of studies where a correction factor would need to be considered. 
Although the analysis in this study is adequately robust to explore the use of multiple comparisons in ophthalmic research, the data do have a number of limitations. The peer-review process for conference abstracts is likely less selective than that for manuscript publication, resulting in study designs and statistical analyses that are incomplete or less polished. In the past, researchers used journal searching techniques to estimate the usage of statistical analysis in ophthalmology research. However, these techniques have focused on subspecialty-specific journals and are not representative of all ophthalmology research. The technique used herein allows the analysis to include an unprecedented number of international research studies, categorized by subspecialty. Theoretically, researchers may conduct a multiple comparison correction without mentioning it in the text. However, our analysis was focused on only those studies that report multiple, numeric P values. It would be illogical and inconsistent to report a corrected P value without reporting the new alpha level; such a practice would leave the reader unable to interpret any results. We assumed, therefore, that if specific P values were reported and a correction was conducted, it was mentioned in the text. Space and time constraints in conference abstracts may lead to increased publication bias, but this would only result in an underestimate of the type I error rate. The simulation analysis in this study resulted in nearly one third of all statistically rigorous studies reporting a type I error. Although these numbers may underestimate the error rate, they illustrate very well the need for more liberal use of multiple comparison corrections in ophthalmic research. 
It should be noted that this analysis refers only to type I error in the form of statistical analysis. Indeed, the type I error of a final conclusion can be kept in check using alternative methods. If researchers are diligent in confirming statistical tests with biologic or experimental research, they will also be able to rein in the probability of a false-positive result. These practices, however, do not alter the expected, a priori FWER, and a discussion of multiple comparisons is still warranted. 
Referencing the results generated in this study and current guidelines, we suggest that ophthalmic researchers routinely address the need of multiple comparison corrections during study design. Although there is no gold standard for their use, researchers should strongly consider these correction methods in any confirmatory analysis or when multiple, related significance tests are performed. This is especially important if the authors are suggesting alterations in accepted clinical or surgical practices where any error in data reporting can have significant deleterious effects. When researchers are data mining, hypothesis generating, or performing other exploratory studies, these corrections may not be necessary. However, reasoning for not correcting an alpha level should be deliberately considered, discussed, and published. 
References
Dunnett CW . A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc. 1955;50:1096–1121. [CrossRef]
Godfrey K . Statistics in practice. Comparing the means of several groups. N Engl J Med. 1985;313:1450–1456. [CrossRef] [PubMed]
Bauer P . Multiple testing in clinical trials. Stat Med. 1991;10:871– 889 ; discussion 889–890. [CrossRef] [PubMed]
Savitz DA Olshan AF . Multiple comparisons and related issues in the interpretation of epidemiologic data. Am J Epidemiol. 1995;142:904–908. [PubMed]
Perneger TV . What's wrong with Bonferroni adjustments. BMJ. 1998;316:1236–1238. [CrossRef] [PubMed]
Bender R Lange S . Adjusting for multiple testing—when and how? J Clin Epidemiol. 2001;54:343–349. [CrossRef] [PubMed]
Goodman SN . Multiple comparisons, explained. Am J Epidemiol. 1998;147:807– 812 ; discussion 815. [CrossRef] [PubMed]
Manor O Peritz E . Re: “Multiple comparisons and related issues in the interpretation of epidemiologic data.” Am J Epidemiol. 1997;145:84–85. [CrossRef] [PubMed]
Rothman KJ . No adjustments are needed for multiple comparisons. Epidemiology. 1990;1:43–46. [CrossRef] [PubMed]
Thompson JR . Invited commentary: Re: “Multiple comparisons and related issues in the interpretation of epidemiologic data.” Am J Epidemiol. 1998;147:801–806. [CrossRef] [PubMed]
Greenland S . Multiple comparisons and association selection in general epidemiology. Int J Epidemiol. 2008;37:430–434. [CrossRef] [PubMed]
Chen X Capizzi T Binkowitz B Quan H Wei L Luo X . Decision rule based multiplicity adjustment strategy. Clin Trials. 2005;2:394–399. [CrossRef] [PubMed]
Proschan MA Waclawiw MA . Practical guidelines for multiplicity adjustment in clinical trials. Control Clin Trials. 2000;21:527–539. [CrossRef] [PubMed]
Veazie PJ . When to combine hypotheses and adjust for multiple tests. Health Serv Res. 2006;41:804–818. [CrossRef] [PubMed]
Lang TA Secic M How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers. 2nd ed. Philadelphia: American College of Physicians Press; 2006: xxii, 490.
Peduzzi P Henderson W Hartigan P Lavori P . Analysis of randomized controlled trials. Epidemiol Rev. 2002;24:26–38. [CrossRef] [PubMed]
Curran-Everett D Benos DJ . Guidelines for reporting statistics in journals published by the American Physiological Society. Adv Physiol Educ. 2004;28:85–87. [CrossRef] [PubMed]
Ranstam J . Analyzing a randomized trial. Acta Radiol. 2008;49:1005–1006. [CrossRef] [PubMed]
Backman S Baker A Beattie S 2011 Canadian Journal of Anesthesia Guide for Authors. Can J Anesth. 2011;58:668–696. [CrossRef]
Moher D Schulz KF Altman DG . The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials. BMC Med Res Methodol. 2001;1: Art. 2.
Moher D Schulz KF Altman D . The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA. 2001;285:1987–1991. [CrossRef] [PubMed]
Schochet PZ Technical Methods Report: Guidelines for Multiple Testing in Impact Evaluations. Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education; 2008;.
Frieidlin B Korn EL Gray R Martin A . Multi-arm clinical trials of new agents: some design considerations. Clin Cancer Res. 2008;14:4368–4371. [CrossRef] [PubMed]
CPMP Working Party on Efficacy of Medicinal Products. Biostatistical methodology in clinical trials in applications for marketing authorizations for medicinal products. Note for Guidance III/3630/92-EN. Stat Med 1995;14:1659–1682. [CrossRef] [PubMed]
Waalen J Beutler E . Beware of multiple comparisons: a study of symptoms associated with mutations of the HFE hemochromatosis gene. Clin Chim Acta. 2005;361:128–134. [CrossRef] [PubMed]
Procopio M . The multiple outcomes bias in antidepressants research. Med Hypotheses. 2005;65:395–399. [CrossRef] [PubMed]
Wilhelmus KR . Beyond the P. I: Problems with probability. J Cataract Refract Surg. 2004;30:2005–2006. [CrossRef] [PubMed]
Wilhelmus KR . Beyond the P. II: Precluding a puddle of P values. J Cataract Refract Surg. 2004;30:2207–2208. [CrossRef] [PubMed]
Fan Q Teo YY Saw SM . Application of advanced statistics in ophthalmology. Invest Ophthalmol Vis Sci. 2011;52:6059–6065. [CrossRef] [PubMed]
Team RDC R: A Language and Environment for Statistical Computing. Vienna, Austria; 2011.
Curran-Everett D . Multiple comparisons: philosophies and illustrations. Am J Physiol Regul Integr Comp Physiol. 2000;279:R1– R8. [PubMed]
Footnotes
 Disclosure: A.W. Stacey, None; S. Pouly, None; C.N. Czyz, None
Figure 1.
 
FWER, the probability that at least one type 1 error will occur in a study, increases as the total number of significance tests performed within the study increases. The solid line represents the FWER at an alpha level of 0.05. The FWER is smaller when the alpha level is decreased (alpha = 0.01, dotted line) and is larger when the alpha level is increased (alpha = 0.10, dashed line).
Figure 1.
 
FWER, the probability that at least one type 1 error will occur in a study, increases as the total number of significance tests performed within the study increases. The solid line represents the FWER at an alpha level of 0.05. The FWER is smaller when the alpha level is decreased (alpha = 0.01, dotted line) and is larger when the alpha level is increased (alpha = 0.10, dashed line).
Table 1.
 
The Prevalence of P Values Reported at ARVO 2010, by Category
Table 1.
 
The Prevalence of P Values Reported at ARVO 2010, by Category
Total # of Abstracts # Reporting P Values % Reporting P Values Max # P Values Reported Median # of P Values (where reported) # of Abstracts Reporting >5 P Values # of Abstracts Reporting >10 P Values
Anatomy 225 68 30% 50 2 12 2
Biochemistry 562 124 22% 20 3 33 10
Clinical epidemiology 348 169 49% 14 3 49 5
Cornea 878 321 37% 1,000,000 3 56 8
Eye movements 272 97 36% 29 3 19 6
Genetics 48 6 13% 100,000 2.5 2 2
Glaucoma 752 459 61% 100 3 135 31
Immunology 362 88 24% 12 2 11 2
Lens 240 46 19% 12 3 11 2
Multidisciplinary 177 51 29% 21 2 10 1
Nanotechnology 25 6 24% 2 1 0 0
Physiology 312 112 36% 24 3 25 6
Retina 1076 463 43% 20 2 112 14
Retinal cell biology 600 163 27% 100,000 2 27 2
Visual neurology 265 48 18% 10 3 8 1
Visual psychology 273 100 37% 9 2.5 28 0
Total 6415 2321 36% 1,000,000 3 538 92
Table 2.
 
Analysis of the Prevalence of Multiple Comparison Corrections in Ophthalmic Research Presented at ARVO 2010
Table 2.
 
Analysis of the Prevalence of Multiple Comparison Corrections in Ophthalmic Research Presented at ARVO 2010
Bonferroni Tukey False Discovery Rate Least Significant Difference Dunnett Scheffe Newman- Keuls Multiple Comparison NOS Total % of All Abstracts % of All Abstracts Reporting P Values
Anatomy - - 2 - - - - 2 4 1.8% 5.9%
Biochemistry 2 - 1 - - - 1 1 5 0.9% 4%
Clinical epidemiology - 1 - - - - - - 1 0.3% 0.6%
Cornea 7 1 - 2 - 1 - 4 15 1.7% 4.7%
Eye movements 1 3 - - - - - 1 5 1.8% 5.2%
Genetics 3 - - - - - - 1 4 8.3% 66.7%
Glaucoma 4 1 1 2 1 1 1 3 14 1.9% 3.1%
Immunology - - - - - - - 1 1 0.3% 1.1%
Lens - 3 - - - - - - 3 1.3% 6.5%
Multidisciplinary 1 - - - - - - - 1 0.6% 2%
Nanotechnology - - - - - - - - 0 0% 0%
Physiology - 2 - - 1 - - - 3 1% 2.7%
Retina 3 4 - - - - - - 7 0.7% 1.5%
Retinal cell biology 1 1 - - - - - - 2 0.3% 1.2%
Visual neurology - 1 1 - 1 - - - 3 1.1% 6.3%
Visual psychology 2 4 - - - - - - 6 2.2% 6%
Total 24 21 5 4 3 2 2 13 74 1.2% 3.2%
Table 3.
 
Simulation Characteristics and Results
Table 3.
 
Simulation Characteristics and Results
Abstract Characteristics Simulation Results
# of Reported P Values # Abstracts Meeting Criteria Total # of P Values in Included Studies Average # of Simulated Type I Errors Average # of Simulated Studies with a Type I Error % of Simulated Studies with a Type I Error % of Studies with Multiple Type I Errors
5 or more 511 3703 185.3 154.2 30.20% 5.20%
10 or more 80 1054 52.7 38.2 47.70% 14.00%
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×