June 2001
Volume 42, Issue 7
New Developments in Vision Research  |   June 2001
Small Samples: Does Size Matter?
Author Affiliations
  • Andrew John Anderson
    From the Department of Optometry and Vision Sciences, The University of Melbourne, Carlton, Victoria, Australia.
  • Algis Jonas Vingrys
    From the Department of Optometry and Vision Sciences, The University of Melbourne, Carlton, Victoria, Australia.
Investigative Ophthalmology & Visual Science June 2001, Vol.42, 1411-1413. doi:
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Andrew John Anderson, Algis Jonas Vingrys; Small Samples: Does Size Matter?. Invest. Ophthalmol. Vis. Sci. 2001;42(7):1411-1413.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements
“Only five subjects in a scientific study? I trust this is a typographical error… .” 1 In all scientific studies, investigators must consider how large a sample should be to reflect the population from which it was drawn. Some studies are designed to quantify the magnitude of a particular parameter in the population (e.g., average flicker sensitivity) 2 or to compare parameters between different populations (e.g., treated and control groups), and in these cases power analyses are accepted methods for determining how large a sample should be. 3 However, there are other types of studies in which investigators demonstrate new effects within a system but do not explicitly quantify population parameters. Many of the psychophysical and neurophysiological studies reported in major journals fit this latter category. Typically, these studies use small numbers of subjects and show that all the subjects tested demonstrate the investigated effect—for example, two rhesus monkeys 4 or two human observers with rod dysfunction, 5 three human observers, 6 four rats, 7 five human observers. 8 However, the method for determining the number of subjects is rarely, if ever, stated. How can these small sample sizes be reconciled with other studies investigating novel effects that use markedly larger sample sizes (e.g., 23 human subjects, 9 40 human subjects 10 )? 
It could be argued that studies using small sample sizes are not meant to quantify general performance within a population but merely to document the existence of an effect, and so the number of subjects is less important. However, the fact that investigators bother to perform replications in such studies implies a wish to demonstrate that their findings are not aberrant and should be taken as representing the performance of the population at large. Why, therefore, is the ability of these studies to predict the population’s performance not considered? Can an author justify the extra costs (in time and money) in testing four subjects, when he or she may just as well test only two (or even one)? 
This issue becomes even more important when considering that large subpopulations can exist within a population. An obvious case is gender. A naive investigator could perform an experiment on three randomly selected subjects and arrive at the conclusion that all people are female. Although such an example may seem ridiculous, it highlights the effects that sampling artifacts can have, especially when subpopulations exist. Therefore, the question that begs consideration is: what sample size is required to ensure, to a specified confidence, that the results are indicative of the general population? 
We will consider the situation in which the presence of a previously undocumented effect is to be investigated. The following assumptions are made:
    Using a particular experimental paradigm, or set of paradigms, the effect is either present or absent; that is, equivocal results are not found.
    In the group of subjects tested, all subjects show the effect (which we will term “serial successes”). The number of serial successes is therefore equal to the sample size, N.
    The group of subjects is randomly chosen from a selectively normal population.
If assumption 1 is taken to be correct, then the probability of the effect being present can be described by a binomial distribution. Even if the effect is, in fact, part of a continuum, it will typically be rendered binomial by some criterion based on statistical testing (that is, findings are either significant or nonsignificant). For example, a study may investigate the effect of exercise on pulse rate. Although pulse rates represent a continuum (as might the effects of exercise), subjects will either show significantly altered rates or not. In a well-designed study, it is likely that the presence of the effect in each subject will be confirmed using a number of experimental paradigms and rigorous statistical analysis. 
Assumption 2 is reasonable and realistic, given that the majority of studies using small sample numbers report serial successes. The situation in which subjects who do not show the effect are present is necessarily more complex and will not be discussed, except to say that any departure within a small sample necessitates a more thorough investigation with enlarged sample numbers. 
Assumption 3 needs further consideration. The term selectively normal is used, because many studies have selection criteria for their subjects (e.g., criteria for general health, color vision, visual acuity). As such, subjects are not sampled from the entire population, but from a criterion-determined subpopulation (a selectively normal population). However, it is important to note that samples are often a more narrow subset than stated. Selection from undergraduate or postgraduate students, for example, will result in an overrepresentation of young, educated, myopic subjects, even if age, educational status, and refractive error are not specified as selection criteria. Similar sampling artifacts can unwittingly manifest in animal studies as well. 11  
If we accept these underlying assumptions, then θ can be used to describe the proportion of the selectively normal population that shows the effect being investigated. For any number of serial successes (N) in the sample group, this result is always consistent with θ = 1—that is, the entire population shows the effect. This defines the upper limit on the population proportion, θ. What is more important is to find the smallest population proportion that is consistent with the observed number of serial successes. Taking the common statistical criterion of P = 0.05, then the lower limit for θ provides the minimum population proportion for the effect, with a 95% confidence, given a number of serial successes, N. Stated another way, if the population proportion were any smaller than the lower limit on θ, there would be a greater than 1 in 20 chance that, in N subjects, the effect would not be shown (that is, a failure would be present). 
The following equation describes the range of values θ can take:  
\[{\theta}^{\mathit{N}}\mathrm{\ {\geq}\ 0.05}\]
where θ is the population proportion (as a fraction), N is the number of serial successes (and is equivalent to the sample size), and 0.05 is the level of confidence (1 in 20). The equation is derived from that given by Clopper and Pearson 12 for the calculation of binomial distribution confidence limits. Solving for the minimum value of θ (θmin, as a percentage) gives the column headedθ min (P = 0.05) in Table 1
What should the criterion for θmin be? For an unknown effect, a useful starting point is that an effect must be present in the majority of the population if it is to be classified as“ normal”; that is, θmin must be at least 50%. Using this assumption (as well assumptions 1–3) a sample size N = 5, all showing the effect, is required to confidently (P = 0.05) say that the population proportion for the effect is greater than 50%. The sample size must be increased if subjects who do not show the effect are present (that is, serial successes are not achieved). For completeness, Table 1 also lists the relationship between θmin and sample size for P = 0.10 and P = 0.01. Using these criteria, sample sizes of four and seven, respectively, are required to be consistent with a population proportion of at least 50%. 
To provide more confident estimates of the population proportion, much larger numbers are needed. For example, to be confident (P = 0.05) that the population proportion is at least 95%, 59 subjects showing the effect would be required. Such studies, however, are rarely performed. Instead, it is more common for data to be collected on a smaller sample, whose size is determined by a power analysis and mean values for the magnitude of the effect compared with conventional statistical analyses (e.g., t-tests). It should be noted, however, that these latter types of analyses determine whether a significant effect exists in the population on average and provide no estimate of the population proportion, θ. Such analyses may be successfully used on small-sample-size psychophysical data. 13  
It should also be noted that a study may not be designed to quantify the performance of a normal population, but that of a disease group instead. 5 The model outlined herein is identical, however, except that the predicted values for θmin now relate to the population of observers with a particular disease, instead of the normal population. 
It is possible that the model can be improved. Often, an investigated effect is shown to be dependent on, or correlate with, a previously documented effect. In such cases, the estimated population proportion of this previously documented effect provides additional information about the population proportion of the investigated effect, and so a more confident estimation of θ may be made than that given in Table 1 . As such, it may be possible to use reduced numbers of subjects to clarify aspects of documented “normal” effects. However, there are also instances in which the outcomes of similar experiments differ between authors. In such cases, the estimated population proportion of the previously documented effect provides additional knowledge that reduces our confidence in our estimation of θ. It should be emphasized, however, that the reliability of such previous studies depends on the number of subjects investigated and the soundness of the studies’ experimental designs. 
It is possible that some form of Bayesian logic could be used to combine the results of previous small-sample-size studies with new studies, in a way similar to that proposed for clinical decision making. 14 Until the validity of such a model has been established for the type of data discussed in this article, the approach outlined herein provides a starting point for determining the general applicability of studies making use of small sample sizes. Despite criticisms, 1 a sample size of five may well be useful in scientific research. 
In summary, the model outlined allows predictions to be made from experimental data obtained from limited numbers of samples. Our approach is appropriate for studies documenting the presence of an effect in each of a small number of subjects and allows inferences to be made regarding the proportion of the population expected to show the same effect. As such, the model may be usefully employed in small-sample-size psychophysical investigations, so that the general applicability of results may be predicted. In addition, the model may be used to estimate the number of subjects needed to determine, to a desired statistical confidence, the prevalence of an effect. Our approach is not applicable to analyzing the magnitude of a particular effect within a population, however; conventional power analyses and statistical testing are available for this task. 
Table 1.
Minimum Population Proportions Consistent with N Serial Successes, Given Statistical Criteria of P = 0.10, 0.05, and 0.01
Table 1.
Minimum Population Proportions Consistent with N Serial Successes, Given Statistical Criteria of P = 0.10, 0.05, and 0.01
Successes (N) % Minimum Population Proportions (θmin)
θmin (P = 0.10) θmin (P = 0.05) θmin (P = 0.01)
1 10 5 1
2 32 22 10
3 46 37 22
4 56 47 32
5 63 55 40
6 68 61 46
7 72 65 52
8 75 69 56
9 77 72 60
10 79 74 63
22 90
29 90
44 90
45 95
59 95
90 95
Norris E. Downsized sample. New Scientist. November 1999;60.
Tyler CW. Two processes control variations in flicker sensitivity over the life span. J Opt Soc Am A. 1989;6:481–490. [CrossRef] [PubMed]
Cohen J. Statistical Power Analysis for the Behavioural Sciences. 1969;1–16. Academic Press New York.
Fuster JM, Bodner M, Kroger JK. Cross-modal and cross-temporal association in neurons of frontal cortex. Nature. 2000;404:347–351. [CrossRef] [PubMed]
Hansen RM, Fulton AB. Background adaptation in children with a history of mild retinopathy of prematurity. Invest Ophthalmol Vis Sci. 2000;41:320–324. [PubMed]
Freeman TCA, Fowler TA. Unequal retinal and extra-retinal motion signals produce different perceived slants of moving surfaces. Vision Res. 2000;40:1857–1868. [CrossRef] [PubMed]
Laubach M, Wessberg J, Nicolelis MAL. Cortical ensembles activity increasingly predicts behaviour outcomes during learning of a motor task. Nature. 2000;405:567–571. [CrossRef] [PubMed]
Braun C, Schweizer R, Elbert T, Birbaumer N, Taub E. Differential activation in somatosensory cortex for different discrimination tasks. J Neurosci. 2000;20:446–450. [PubMed]
Blog MG, Kersten D, Hurlbert AC. Perception of three-dimensional shape influences colour perception through mutual illumination. Nature. 1999;402:877–879. [PubMed]
Bonato F, Cataliotti J. The effects of figure/ground, perceived area and target saliency on the luminosity threshold. Perception Psychophys. 2000;62:341–349. [CrossRef]
Ward GE, Wainwright PE. The contribution of animal models to understanding the role of fats in infant nutrition. Huang Y Sinclair AJ eds. Lipids in Infant Nutrition. 1998;39–62. American Oil Chemists Society Press Champaign, IL.
Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934;26:404–413. [CrossRef]
Anderson AJ, Vingrys AJ. Interactions between flicker thresholds and luminance pedestals. Vision Res. 2000;40:2579–2588. [CrossRef] [PubMed]
Aspinall P, Hill AR. Clinical inferences and decisions. I: diagnosis and Bayes’ theorem. Ophthalmic Physiol Opt. 1983;3:295–304. [PubMed]

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.