In the spring of each year The Association for Research in Vision and Ophthalmology (ARVO) conducts an international research meeting, bringing together researchers in all fields of ophthalmology. Research presentations at ARVO are delivered using both oral and poster methods. Each presentation is submitted in abstract form and is peer-reviewed prior to acceptance. At the conclusion of the meetings, ARVO publishes the abstracts for every poster and oral presentation online in portable document format (PDF). At the time of this study, the most recent abstracts available through ARVO online were from the meeting held in May 2010. Presentations at ARVO 2010 were divided into 16 subspecialty categories (
Table 1).
Every abstract presented at ARVO 2010 was downloaded in PDF and searched for P values. The PDF document was searched for the terms, “P value,” “P ,” “P,” “P,” and all spatial variations of the same. All abstracts were also searched for the most common multiple comparison correction methods using the terms “Bonferroni,” “Scheffe,” “Tukey,” “Duncan,” “Dunnett,” “Newman-Keuls,” “Sidak,” “Least Significant Difference,” “False Discovery Rate,” as well as the general terms “multiple comparison” and “multiplicity.” The search was automated, highlighting all the terms listed above. After the automated search was complete, two of the authors (AS and SP) and two assistants conducted a manual review of the search results, assessed the results for validity, and recorded two variables for each abstract: the number of reported P values and whether a correction factor was used.
Studies that reported considerable statistical output, in the form of 5 or more reported P values (FWER of 23% or greater) and 10 or more P values (FWER of 40% or greater), were analyzed for their use of a correction factor. If a correction factor was not mentioned, the abstracts were used in a simulation study. The goal of the simulation study was to estimate the number of type I errors expected in these statistically rigorous studies. Criteria for inclusion in the simulation were 5 or more reported P values and no reported correction factor. For each abstract that met inclusion criteria, a binomial distribution was used to simulate the number of type I errors reported in the abstract using the number of reported P values as the “number of observations” parameter and an assumed alpha level of 0.05 as the “success” parameter. The simulation parameters can be written as Yi ∼ BINOMIAL(ni, p), where ni is the number of reported P values in the ith abstract, P equals the alpha level (0.05) or the probability of type I error, and Yi is the resulting number of simulated type I errors in the ith abstract. Because the null hypothesis was unknown in all cases, it was assumed to be true for all statistical comparisons. One simulation was complete when the resulting number of type I errors for each abstract was estimated using the above distribution. At the end of one simulation, results were recorded including: the total number of type I errors in all studies, the number of simulated studies with type I errors, and the number of simulated studies with more than one type I error. This process was repeated 10,000 times and the average results were calculated. A separate simulation study was carried out for all abstracts with 5 or more P values, and all abstracts with 10 or more P values. The simulation study was completed using the R software (GNU Project) statistical package (provided in the public domain by the R Foundation for Statistical Computing, Vienna, Austria, available at http://www.r-project.org/).30