Given any study reporting absolute numbers for the four possible outcomes of two tests performed in a simultaneous fashion, as defined herein, latent class analysis allows for the calculation of true mathematical test sensitivity and true prevalence of disease within the study population. Schulzer et al.
9 have described the application of this general technique for estimating the sensitivity and specificity of visual field testing for optic nerve damage progression in normal-tension glaucoma. A mathematical method is ideal for the calculation of true sensitivity in the absence of a gold standard test. As a reference standard, clinical criteria are subjective and dependent on interobserver variation and thus do not represent true sensitivity. A mathematical derivation from the results of bilateral biopsies is objective and relies on the test itself to calculate the true sensitivity experimentally.
The calculated sensitivities for each study showed significant variability, ranging from 69% to 98.7%
(Table 1) . The differences between studies are probably the result of factors such as variable TAB methodology (e.g., length of specimen) or interpretation (e.g., number of levels examined per specimen, skill of pathologists). However, when the overall mean TAB sensitivity calculated by our method (87.1%) is compared with pooled means calculated by meta-analytic techniques (88.9%) and a random-sampling bootstrap technique (86.7%), the results are quite similar, which suggests that, despite significant heterogeneity between the included studies, our method is robust and capable of providing stable results. In addition, the calculated sensitivity of 87.1% closely matches the value of 86.9%, which was derived from our previously reported literature review of unilateral TAB sensitivity,
4 and further supports this mathematical method as capable of estimating the true sensitivity.
Our 87.1% calculated sensitivity of a single TAB is slightly lower than previously published estimates which approached 90% or greater,
4 consistent with the fact that using clinical criteria as a reference standard can produce false-negative results. Our calculated true prevalence of TA in these study populations (26.4%) may appear high, especially when compared to published population-based estimates that report a prevalence of ∼1%, even in the oldest age groups.
18 However, the high true prevalence values should not be surprising when one considers that these populations include patients who are undergoing TAB. The patients in these studies probably demonstrated characteristic symptoms or signs that suggested the presence of TA. In this sense, they represent selected populations in which the likelihood of disease is greater than in the general population.
The true sensitivity of any diagnostic test is useful to know, because it may alter patient management and outcome. The ultimate goal of clinical decision-making is to select a diagnostic and/or therapeutic pathway that maximizes benefits while minimizing risks. The integration of evidence-based data into this process can decrease the level of uncertainty in clinical decision-making. Decision analysis is a technique that employs quantitative methods to delineate elements of the decision-making process and to compare the expected consequences of pursuing different strategies.
19 Regardless of the disease being studied, such models always rely on the use of accurate parameters, such as the sensitivity and specificity of diagnostic tests, to generate meaningful results. For example, we have recently published a decision-making model for the management of patients with suspected TA.
4 The results of this model (e.g., when to perform unilateral TAB, when to perform bilateral TAB, if/when to treat with steroids) are critically dependent on an accurate estimate for the sensitivity of the TAB.
Discordance rates observed with bilateral biopsies, which represent the proportion of bilateral biopsy pairs with conflicting results, are sometimes used as direct estimates of the potential increase in sensitivity gained by performing two biopsies instead of one. We do not believe that this is true, for several reasons. First, there is a 50% chance that the positive side of a discordant biopsy would have been randomly chosen in the case of a unilateral TAB. If this occurs, a unilateral biopsy would be considered 100% sensitive, compared with a discordant bilateral biopsy pair of 50% sensitivity (i.e., the first biopsy identified the disease correctly, but the second biopsy failed to do so). If the discordance rate alone is used to estimate increased diagnostic yield with two biopsies, one would conclude that the second biopsy actually reduced diagnostic yield by 50%, which is not mathematically consistent. Second, in practice, the selection of which side to sample in a unilateral TAB is not always random. Most surgeons tend to sample the side that is clinically abnormal (e.g., on palpation) or potentially symptomatically involved (e.g., monocular visual loss on that side). As a result, the discordance rate may not accurately estimate the increase in sensitivity with the second biopsy, because the true pretest probability of disease in a purely unilateral biopsy could vary, depending on which side is initially chosen.
Third, sensitivity cannot be calculated in the same way for both simultaneous and sequential biopsies. To illustrate this concept, consider a study population undergoing sequential biopsies, where only a negative result on the first biopsy triggers the performance of the second biopsy. Because one test result directly influences the decision to perform the second test, the prevalence of disease in the group undergoing the second biopsy is necessarily different from that of the overall group before the first biopsy is performed. The true likelihood of disease after one negative biopsy is likely to be much lower than that in the undifferentiated population. Using the simultaneous approach, each biopsy is performed, regardless (independently) of the result of the other, so there is no differentiation in the study population between the biopsy events. A single method for calculating the sensitivity of both simultaneous and sequential biopsies fails to account for this key difference. Our mathematical method applies only when each testing event is conditionally independent of one another; thus, it is only valid for deriving the sensitivity from studies reporting results of simultaneous biopsies.
Similarly, the selection of patients undergoing bilateral biopsies versus unilateral biopsy may not be truly random, raising the possibility of selection bias. Patients who undergo bilateral biopsies could have a different pretest probability of disease than patients for whom a unilateral biopsy result is deemed adequate. Cases perceived as diagnostically difficult may be chosen to undergo simultaneous biopsy, whereas clear-cut cases may undergo biopsy of one side only, with a second biopsy only if the first was negative. Also, studies publishing bilateral biopsy results typically report on patients who were treated at tertiary medical centers and may have more aggressive disease than their unselected counterparts. While selection bias is largely unavoidable in our analysis, it may influence generalization of our technique.
The technique described in our study could be extended to other similar diagnostic situations in which one test is performed multiple times at different anatomic sites, as long as such tests are performed in a simultaneous fashion, are conditionally independent, and there are no false-positive results. Fine-needle aspiration biopsies throughout the body (e.g., thyroid, breast) are common examples of diagnostic studies that may meet these criteria.